100% found this document useful (3 votes)
1K views

Stock Market Analysis and Prediction

This document contains a supervisor's recommendation for a project report titled "Stock Market Analysis and Prediction" prepared by four students for their B.Sc. in Computer Science and Information Technology. The supervisor recommends that the report be processed for evaluation as it was prepared under his supervision and fulfills the degree requirements. It also includes a letter of approval from an external supervisor stating that the project satisfies the scope and quality standards for the degree.

Uploaded by

NHopJ1
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
1K views

Stock Market Analysis and Prediction

This document contains a supervisor's recommendation for a project report titled "Stock Market Analysis and Prediction" prepared by four students for their B.Sc. in Computer Science and Information Technology. The supervisor recommends that the report be processed for evaluation as it was prepared under his supervision and fulfills the degree requirements. It also includes a letter of approval from an external supervisor stating that the project satisfies the scope and quality standards for the degree.

Uploaded by

NHopJ1
Copyright
© Public Domain
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

SUPERVISOR’S RECOMMENDATION

I hereby recommend that this report has been prepared under my supervision by
Devendra Adhikari (TU Exam Roll No. 7685/072) ,Diwash Subedi (TU Exam Roll No.
7687/072), Jeevan Pokhrel (TU Exam Roll No. 7688/072) and Utsav Adhikari (TU
Exam Roll No. 7710/072) entitled “Stock Market Analysis and Prediction” in partial
fulfillment of the requirements for the degree of B.Sc. in Computer Science and
Information Technology be processed for evaluation.

………………………………
Mr. Ramesh Singh Saud
Project Supervisor
Nagarjuna College of Information Technology
Hariharbhawan,Lalitpur

ii
LETTER OF APPROVAL
This is to certified that this project prepared by Mr Devendra Adhikari, Mr Diwash Subedi,
Mr Jeevan Pokhrel, Mr Utsav Adhikari entitled "Stock Market Analysis and Prediction
Using Time Series Algorithm" in partial fulfillment of the requirements for the degree of
B.Sc. in Computer Science and Information Technology has been well studied. In our opinion
it is satisfaction in the scope and quality as a project for the required degree.

___________________________ ___________________________
MR. Ramesh Singh Saud External Supervisor
Project Supervisor
Department of Computer Science
and Information Technology
Nagarjuna College of Information
Technology
Pulchowk, Lalitpur
Nepal

iii
ACKNOWLEDGEMENTS

The completion of this project would not have been possible without the support and
guidance of many individuals.
We are grateful to Nagarjuna College of Information Technology for guidance and
supervision, as well as providing all the necessary support and friendly environment for
the successful completion of the project.
We would like to express our gratitude to our project supervisors Ramesh Sing Saud who
took an interest in our project and guided us through the project by providing necessary
ideas, information and knowledge for developing an Stock market Analysis And
Prediction . We would like to thank Ramesh Sing Saud and Dilli Adhikari for their
encouragement and guidance towards the making of this report as per the standard.
We are thankful and fortunate enough to get constant support from our colleagues and
teaching staff of B.Sc. CSIT department, which helped us, complete our project. We would
also like to extend our regards to all the non-teaching staff of B.Sc. CSIT department for
their timely support.

Devendra Adhikari (7685/072)


Diwash Subedi(7687/072)
Jeevan Pokhrel(7688/072)
Utsav Adhikari(7710/072)

iv
ABSTRACT

Stock Market Analysis and Prediction (SMAP) is a web based application able to predict
the stock prices of companies based on their market values and news sentiments
surrounding the company. It is a portal where; general stock market enthusiast can keep
track of their invested companies and are also able to instantly contact their brokers for
purchase or sales of the stocks. The main application of this system however would be to
predict the market values. Along with that it has the features of news portal and general
stock related chatbot.

ARIMA (AutoRegressive Integrated Moving Average) ,used for stock market analysis and
prediction. The algorithm’s main goal is to learn the market trends by training with the
past data and predicting the future value. The calculated values of the computational
analysis i.e. prediction is used to display nearly accurate result .

v
TABLE OF CONTENTS
SUPERVISOR’S RECOMMENDATION .........................................................................ii
LETTER OF APPROVAL ................................................................................................ iii
ACKNOWLEDGEMENTS ................................................................................................ iv
ABSTRACT ......................................................................................................................... v
TABLE OF CONTENTS ................................................................................................... vi
LIST OF FIGURES ......................................................................................................... viii
LIST OF ABBREVIATIONS ............................................................................................ ix
CHAPTER 1 INTRODUCTION ........................................................................................ 1
1.1 Introduction .......................................................................................................................... 1
1.2 Problem Statement ............................................................................................................... 2
1.3 Objectives.............................................................................................................................. 2
1.4 Scope of the Project.............................................................................................................. 2
1.5 Limitations ............................................................................................................................ 3
1.6 Report Organization ............................................................................................................ 3
CHAPTER 2 SYSTEM ANALYSIS ................................................................................... 4
2.1 Literature Review ................................................................................................................ 4
2.2 Requirement Collection and Analysis ................................................................................ 5
2.2.1 Functional Requirements .............................................................................................................. 5
2.2.2 User Requirements ........................................................................................................................ 9
2.2.3 System Requirements .................................................................................................................. 10
2.2.4 Data Requirements ...................................................................................................................... 10
2.2.5 Non-Functional Requirements .................................................................................................... 10
2.2.6 Software Requirement................................................................................................................. 11
2.3 Feasibility Study ................................................................................................................. 11
2.3.1 Technical Feasibility: .................................................................................................................. 12
2.3.2 Operational Feasibility ................................................................................................................ 12
2.3.3 Schedule Feasibility .................................................................................................................... 12
CHAPTER 3 SYSTEM DESIGN ..................................................................................... 13
3.1 System Design ..................................................................................................................... 13
3.1.1 User Interface .............................................................................................................................. 13
3.1.2 System Flow Diagram ................................................................................................................. 14
3.1.3 Class Diagram ............................................................................................................................. 15
3.1.4 Sequence Diagram ...................................................................................................................... 17
3.1.5 Gantt Chart .................................................................................................................................. 18
CHAPTER 4 IMPLEMENTATION AND TESTING ..................................................... 19
4.1 Implementation .................................................................................................................. 19
4.1.1 Algorithm Design........................................................................................................................ 19
4.1.2 ARIMA ....................................................................................................................................... 20
4.1.3 Model Description: ..................................................................................................................... 23
4.1.4 Implementation Tools ................................................................................................................. 30
4.1.5 Other Tools and Platforms .......................................................................................................... 31
4.2 Testing ................................................................................................................................. 32

vi
4.2.1 Test Case ..................................................................................................................................... 32
4.2.2 Test Scripts.................................................................................................................................. 33
CHAPTER 5 CONCLUSION & FUTURE ENHANCEMENTS ................................... 35
5.1 Conclusion........................................................................................................................... 35
5.2 Future Enhancement ......................................................................................................... 35
REFERENCES ................................................................................................................. 37
APPENDIX

vii
LIST OF FIGURES
Fig 2.1: Use Case Diagram of stock market analysis and prediction system………. 6

Fig 2.2: E-R diagram of Stock Market Analysis and Prediction…………….…........ 7

Fig 2.3: DFD Level-0 for stock market analysis and prediction system………......... 8

Fig 2.4: DFD Level-1 for stock market analysis and prediction system………......... 9

Fig 3.1: System Design……………………………………….….…………………. 13

Fig 3.2: User Login…………………………………………..…………………...... 13

Fig 3.3: User Signup………………………………………………………………... 14

Fig 3.4: System Flow Chart………………………………………………………… 15

Fig 3.5: Arima Flow Chart………………………………………………………….. 15

Fig 3.6: Class Diagram of Stock Market Analysis And Prediction…………………. 16

Fig 3.7: Sequence Diagram of Stock Market Analysis And Prediction……………... 17

Fig 3.8: Gantt Chart…………………………………………………………………. 18

Fig 4.1: Company Data……………………………………………………………... 20

viii
LIST OF ABBREVIATIONS

ACF: Auto-Correlation Function


AIC: Akaike Information Criterion
AIML: Artificial Intelligence Modelling Language
AR: Auto Regressive
ARIMA: Auto Regressive Integrated Moving Average
DFD: Data Flow Diagram
GUI: Graphical User Interface
HTML: Hypertext Markup Language
MA: Modeling Average
MAPE: Mean Absolute Percentage Error
NEPSE: Nepal Stock Exchange
PACF: Partial Auto-Correlation Function
RMSE: Root Mean Square Error
SARIMA: Seasonal Auto Regressive Integrated Moving Average
SMAP: Stock Market Analysis And Prediction

ix
CHAPTER 1 INTRODUCTION

1.1 Introduction

Stock analysis is the evaluation of a particular trading instrument, an investment sector, or


the market as a whole. Stock analysts attempt to determine the future activity of an
instrument, sector, or market [1] . Stock market prediction is the act of trying to determine
the future value of a company stock or other financial instrument traded on an exchange.
The successful prediction of a stock's future price could yield significant profit.
The project entitled “Stock market Prediction and Analysis ” is the web based application.
It predict or forecast the future of stock market based on historical time series data. NEPSE
historical time series data were scraped using scrapy tools and stored. Machine learning
models for time series forecasting were used to train those historical data and the result is
visualized on web page for easy understanding and analysis of stock market. The project
encompasses the concept of Data mining and Statistics which makes heavy uses of NumPY,
Pandas and data visualization libraries for data processing. In short, the system accept the
historical data set of company which is processed on our local server and result is displayed
on web browser. Since it is a web application, can be accessible to everybody through the
medium of internet when it is live or hosted on particular domain.

The project is targeted to companies where stock is traded in order to predict and analysis
the financial status and future of company. Along with companies general individual to
understand the pattern of stock market and invest the money.

The Closing Value is the price at which the most recent trade occurred. When the stock
market is open -- the Nepal Stock Exchange is open Sunday through Friday 11:00 a.m. to
15:00 p.m. and are closed on public holidays -- the closing value provides the most up-to-
date value of a stock. Odd lot trading is done on Fridays. Once the stock market closes, the
closing price is the best gauge of value until the stock market opens the next business day.

1
1.2 Problem Statement

Prediction of stocks, however, has not been an easy job since the concept started dating
back to the development of New York Stock Exchange in 1817, major approaches of
prediction of the stocks have been made with and without the use of computing systems.

The condition of the market is said to be unpredictable and none is ever to benefit from the
analysis that is made based on the data. The construct of the market and its environment
constrain the investors from windfall gains as the information about the system is publicly
available and the chances that the same investor may attain the best prices in stocks is
paradoxical.

Stock values are changing depending on the market conditions day by day. The challenge
is to guide the investors for the right time to buy and sell the shares. There are many
regression and classifiers available for the prediction. Effort is to need for determining the
best technique that provide better result in predicting the stock prices and give accurate
trends.

1.3 Objectives

The main objectives of the Stock Market Analysis and Prediction project are:
 To predict future value of company stock
 To analyze the current state of the market
 To identify factors affecting stock market
 To make analysis easy for all general people
 To visualize the share market with the help of interactive charts
 To implement machine learning models

1.4 Scope of the Project

The scopes of this project include:


 Stock Market Analysis and Prediction will be able to show live market status
 Classification of the polarity of financial news
 Useful for new investors to invest in stock market

2
1.5 Limitations

The limitation of the project includes:


 Analysis is based only on the closing value
 Accuracy is only above 90% i.e we can’t acquire 100% accuracy.

1.6 Report Organization


This report is divided into 5 chapters. Each chapter is further divided into different
headings. Chapter 1 gives introduction. The problem definition, objectives, scopes and
limitations of this system are discussed here.

Chapter 2 focuses on the analysis part. It contains literature review section where the
research works done in the field of disease prediction system are discussed in brief. This
chapter also includes requirement analysis, feasibility study and system structure.

Chapter 3 discusses in detail about the design of the system. It provides information about
database schema . The chapter also discusses about process design, input output design.

Chapter 4 gives information about implementation and testing process. It discusses about
how the system is implemented and what tools and software are used to implement this
system. The testing process is also included in detail in this chapter.

Chapter 5 includes conclusion of the system and future enhancement.

3
CHAPTER 2 SYSTEM ANALYSIS

2.1 Literature Review

The review of literature is without a doubt incomplete with Burton G. Malkiel's theory of
Random walk of stock market. According to the author stock market moves in a random
fashion and any kind of previous or historical data cannot be used to predict its future
values. According to the author the market is efficient and will remove any kind of bias or
patterns. But we will observe that many research has provided enough evidence that such
prediction not only works but beats the traditional methods by a long shot.[2]
Aishwarya Singh forecast on time series data using time series analysis models. She have
implemented different models like MA, AR, ARIMA, LSTM etc. According to her LSTM
is best for large number of data and ARIMA is suitable for less (avg 800) data.[3]
Hirotaka Mizuno, Michitaka Kosaka, Hiroshi Yajima demonstrated the use of artificial
neural network on TOPIX (Tokyo Stock Exchange Prices Index). They used moving
average, Deviation of price from moving average, Psychological line, Relative strength
index as inputs for the ANN. Output of the ANN was buy, hold and sell signals. Their
results demonstrated their system could achieve from 9-10% of average return, which was
lower than traditional buy and hold strategy.
However Marijana Zekic has pointed out that many author ignore the possible structure of
ANN which could benefit certain situations. The demonstrates that certain type of ANN
structure perform better than others like 10-20-1 structure with back propagation
learning.[4]

Fernando Fernández-Rodríguez,Christian González-Martel,Simón Sosvilla-Rivero has


demonstrated the correctprofitability in different phases of market (bullish, bearish and
neutral. Their work demonstrates that technical analyses performs far better than buy and
hold strategy in different market conditions.
Discrete Wavelet Transform (DWT) and Artificial Neural Network (ANN) for predicting
financial time series has been studied by S. Kumar Chandar, M. Sumathi and S. N.
Sivanandam. Their hybrid forecasting technique has achieved better results compared with
the approach which is not using the wavelet transform.

4
2.2 Requirement Collection and Analysis

The step of requirement collection plays a vital role in the management and development
of any project. Having a clear idea about what the project is supposed to deliver, at the end
of the term, makes project managers and developers of the project aware of steps to be
taken for the completion of the job. Here in this project we collect the stock data of the
different company from merolagani.com which is used to analyze and predict the current
and future values. Our project mainly focus on forecasting the future value in which the
user(customer) can invest the money. For this project, we took under account two major
requirement criteria, functional requirements and non-functional requirements

2.2.1 Functional Requirements


The requirement that the system must provide to meet the business need. Based on this, the
requirement that system must require:
 Should be able to generate an approximate share price.
 Should collect acceptable and accurate data from Merolagani site.
 Should have an easy interface for the users.

2.2.1.i Use Case Diagram:


Actor 1: User
Description: User must sign up to have full access to system. User are login through their
username and password. Users are prohibited to some features if they are not logged in to
the system. But the user which don’t have the account also have the access to view market
information. Authorized user can calculate predictions of different companies, use
feedback features and be updated of different stock news.

Actor 2: Admin
Description: Admin are responsible for verifying user registration and are capable of user
management in the system. Market information are updated in the system by the user. All
the information about the stock are handled by admin.

5
FIG 2.1: USE CASE DIAGRAM OF STOCK MARKET ANALYSIS AND PREDICTION

2.2.1.1 Data Modeling


2.2.1.1.i E-R Diagram
The E-R diagram shows how the entities are related to each other. These system consists
of mainly four entities i.e. admin, users, prediction and company stock. Admin monitors
both the company stock and users. Admin are responsible to generate the prediction value
for the specified company stock. Admin consists of attributes like id, username and
password. Company stock consists of attributes like name, id, close value, symbol and date.
Similarly prediction consists of attributes like id, date, predict_value, date and actual_value
and users consists of attributes like username, id and password. E-R diagram clearly
illustrates the relationship between all the entities residing on the system which will provide
clear vision of the system.

6
FIG 2.2: E-R DIAGRAM OF STOCK MARKET ANALYSIS AND PREDICTION

2.2.1.2 Process Modeling


Data Flow Diagram(DFD)
A DFD maps out the flow of information for any process or systems. Figure first shows the
level 0 DFD which simply shows that users interact with the system to get the desired result.
Figure second shows the level 1 of DFD which provides a more detailed breakout of pieces
of information of level 0 DFD. The flow of data for the system in following diagram is as
follow:

1. Data Retrieval & Transformation:


2. Scraping process is carried to retrieve the data from Merolagani as csv files.
3. Predictive Analysis
4. Formatted data in Excel are used for predictive analysis.

7
5. Predictive Model Generation Algorithm
6. ARIMA algorithm is used to generate a model to predict the value.
7. Charts Generation
8. Predicted Trend are illustrated in chat for better understanding and
representation.
9. Training of Data and Prediction
10. Using test data and algorithm data are trained and are made capable to
predict the stock price.
11. Data Validation and Results Generation
12. The results are tested for error i.e. validation process is carried out and
afterwards result are generated.

FIG 2.3: DFD LEVEL-0 FOR STOCK MARKET ANALYSIS AND PREDICTION SYSTEM

8
FIG 2.4: DFD LEVEL-1 FOR STOCK MARKET ANALYSIS AND PREDICTION

2.2.2 User Requirements

 The user shall be able to Register, Login and Logout in the system.
 The user shall be able to view the stock market’s daily data and historical data.
 The user shall be able to search for a specific company listed in the Merolagani.
 The user shall be able to view market data of a specific company.
 The user shall be able to send feedback.
 The user shall view the latest financial news about stock market.
 The user shall be able to analyze different company’s market condition through the
help of the historical data and financial news.
 The user shall be able to view all the listed company shares he/she owns.

9
2.2.3 System Requirements
The system will display the information about daily market data of each company listed in
the Merolagani.

The system shall display historical market data of each company. This data will be
represented both in numerical and chart format. Similarly, the system will also display the
current and historical financial news related to Stock Market.

The financial news obtained shall be tagged based on their positive or negative polarity,
the date of announcement of the news and the company the news relates to.

The system will build a prediction model on the basis of the historical market data. This
prediction model will be used to predict the rise or fall of market of specific company in
the future.

To predict tomorrow’s market data of a company, the prediction model developed will take
today’s market data of that company as input. The output of the model will be indication
of either rise or fall of the market of that company.
The predicted indication shall be displayed to the users.

2.2.4 Data Requirements


Company stock data scraped will contain the date, closing value.The data scraped is stored
in csv file format and then transported to the database for training the prediction model.
Similarly, the data is also stored in MySQL database to display in the system. Prior to the
application, the database shall be updated to the latest values in market and news. The charts
and comparisons of the companies will be made only on the basis of latest data.

The predicted indication of rise or fall of market data will be stored in the database before
display.

2.2.5 Non-Functional Requirements


Reliability: The reliability of the product will be dependent on the accuracy of the data
date of purchase, how much stock was purchased, high and low value range as well as

10
opening and closing figures. Also, the stock data used in the training would determine the
reliability of the software.

Security: The user will only be able to access the website for inserting the stock prices
using his login details and will not be able to access the computations happening at the back
end.

Maintainability: The maintenance of the product would require training of the software
by recent data so that the recommendations are up to date. The database has to be updated
with recent values.

Portability: The website is completely portable and the recommendations completely


trustworthy as the data is dynamically updated.

Interoperability: The interoperability of the website is very high because it synchronizes


all the database with the server.

2.2.6 Software Requirement


Being a web application, the only dependency the system has is with the web browser. The
system however outlines the following requirements for the Operating System and Web
Browser.
Operating System: Windows, Linux, Mac OS
Web Browser: Safari, IE (8.0 or above), Edge, Mozilla Firefox (3.0 or above), Google
Chrome

2.3 Feasibility Study


Feasibility study is the study of how successful the project can be, accounting for factors
like, economical, technological, legal and scheduling. Project managers make use of
feasibility study to determine the positive or negative outcomes of a project before making
any investments into it. The various feasibility analysis is included below

11
2.3.1 Technical Feasibility:
The user requirements are easily met by the system and the system is technically feasible
to work upon. The system uses, DJANGO as the web framework coupled with MySQl as
the database server.

2.3.2 Operational Feasibility


Since the system has a high probability of being able to be converted into a Decision
Support System, there is no question of resistance among the user groups for the operations
of the system.

2.3.3 Schedule Feasibility


Since this project is a small project and constitutes dependencies, we go with agile
methodology of development of the system. During the development process, small
iterative changes are made in the system.

12
CHAPTER 3 SYSTEM DESIGN
3.1 System Design
System design is simply the overall design of the system. The readily set system design
parameters are especially useful for the micro process of system development, converting
the product from blueprint to actual application. This document contains the overall design
of the system. The system will be constructed in 3-Tier Architecture as:

Client Web Server

User Running Web Running Web Database


Browser Application Management

FIG 3.1: SYSTEM DESIGN

3.1.1 User Interface


An interactive and easy to use user interface is the goal of the system. The design doesn’t
contain any ambiguous spaces and is self-explanatory

FIG 3.2: USER LOGIN

13
FIG 3.3: USER SIGNUP

3.1.2 System Flow Diagram


System flow chart simply describes a working method of system in which user choose a
company which value is to be predicted. Then ARIMA algorithm runs which simply
generates a result which are shown properly in the charts.
ARIMA algorithm flow chart is also described above. First we choose our data set which
will be in csv format. Then data set are checked if they are stationary or not. If it is not
stationary we will be using differencing method to make it stationary. If it is stationary
we will use ACF & PACF to find the p, d, q parameters for the model. We will fit the
parameters to our model and train our model. Predicted value is obtained which is used
to evaluate the accuracy of the model using MAPE. Flow chart simply shows the working
method of algorithm and the system.

14
FIG 3.4: SYSTEM FLOW CHART FIG 3.5: ARIMA ALGORITHM FLOW CHART

3.1.3 Class Diagram


Classes in class diagrams are represented by boxes that are partitioned into three:
1. The top partition contains the name of the class.
2. The middle part contains the class’s attributes.
3. The bottom partition shows the possible operations that are associated with the class.
In this diagram user’s class has attributes like id, username, password, first_name,
last_name and email. Many user can be added using addUser operation. One user can
access many stock prediction prices. Stock class has attribute like id, obs_data and date.

15
Different operations like adding stock, deleting stock and viewing stock can be performed.
Certain company differs in its stock prices. Company has attributes like id, company_name,
email and symbol. Different operations like adding company and extracting company
information operations are carried out. One company can have multiple company data
where company data can have attributes like id, close, obs_data and date. Users can view
data and date of company. One company can have several news where news can have
attributes like id, title, image, detail, date and author where operation like viewing news
can be performed.

FIG 3.6: CLASS DIAGRAM OF STOCK MARKET ANALYSIS AND PREDICTION

16
3.1.4 Sequence Diagram
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. In figure bellow
we have drawn the sequence diagram for our system. Admin are logged in to the system
where admin info are stored in database and after that admin are allowed to add market
information. User first register to the system where user information are checked by admin
and are stored to the database. Registration successful acknowledgement is send to the user.
User can now login to the system and system access is provided to user after login details
are validate. After user are logged in the system they can add owned stock data and store it
in the database. They can view today’s market by sending a query to the system. Query is
accepted and market information are displayed to the users. Users can view prediction and
are also capable of calculation to predict the certain market stock.Users are allowed to
logout through system after the use of the system

FIG 3.7: SEQUENCE DIAGRAM OF STOCK MARKET ANALYSIS AND PREDICTION

17
3.1.5 Gantt Chart
Gantt chart is a type of chart that illustrates a project schedule. It is a similar to activity
diagram as it shows the scheduled duration for the task to finish the project. From figure
bellow we can see that it illustrates an chart for our system. We have started our project in
02/24/2019 and continued different process to finish up the project. The chart shows that it
has taken 10 days for requirement gathering, 20 days for analysis, 15 days for design, 32
days for coding, 13 days for testing, 33 days for implementation and documentation is
carried out through every process.

FIG 3.8: GANTT CHART

18
CHAPTER 4 IMPLEMENTATION AND TESTING

4.1 Implementation
The main purpose of implementation of this system is to predict the stock prices based on
the previous stock prices

4.1.1 Algorithm Design


Algorithms are the operational infrastructure of every project; the algorithms determine
how and how the program operated and generated results based on the calculations. An
effective algorithm must encompass all the data variables available for computation and in
return generate an efficient flow as well as true results of the processing afterwards. . When
it comes to predictive analysis there is a myriad of choices over the internet that operate in
statistical data to generate associative output. Choosing between these numerous
algorithms itself needs a good amount of study upon the topics and also a deep analysis of
the predictions being made from the system. Since, in this case there are multiple number of
dependent variables that are key points on prediction, we have adopted the algorithm of
ARIMA .

Data Collection
In the first phase, a number of scraping scripts to collect data from the sources mentioned
previously in the project. The data is composed of market data of companies

19
FIG 4.1: COMPANY DATA

4.1.2 ARIMA
One of the most common methods used in time series forecasting is known as the ARIMA
model, which stands for Autoregessive Integrated Moving Average. ARIMA is a model that
can be fitted to time series data in order to better understand or predict future points in the
series.

There are three distinct integers (p, d, q) that are used to parametrize ARIMA models.
Because of that, ARIMA models are denoted with the notation ARIMA(p, d, q). Together
these three parameters account for seasonality, trend, and noise in datasets:

 p is the auto-regressive part of the model. It allows us to incorporate the effect of past values
into our model. Intuitively, this would be similar to stating that it is likely to be warm
tomorrow if it has been warm the past 3 days.
 d is the integrated part of the model. This includes terms in the model that incorporate the
amount of differencing (i.e. the number of past time points to subtract from the current
value) to apply to the time series. Intuitively, this would be similar to stating that it is likely
to be same temperature tomorrow if the difference in temperature in the last three days has
been very small.

20
 q is the moving average part of the model. This allows us to set the error of our model as a
linear combination of the error values observed at previous time points in the past.

The equation of ARIMA(2,0,1) is like:

Yt = a1Yt-1+a2Yt-2 + b1Et-1 where AR term = a1Yt-1+ a2Yt-2 and MA term = b1Et-1

In our project y is the observed value of different time stamp t of stock and value of p,d,q
is provided as per necessary to obtain high accuracy.

The algorithm is implemented on following order:

Step 1: Check Stationary:- If a time series has a trend or seasonality component, it must be
made stationary before we can use ARIMA to forecast.

Step 2: Difference:-If the time series is not stationary, it needs to be stationarized through
differencing. Take the first difference, then check for stationarity. Take as many differences
as it takes. Make sure you check seasonal differencing as well.

If d=0: yt = Yt

If d=1: yt = Yt - Yt-1

If d=2: yt = (Yt - Yt-1) - (Yt-1 – Yt-2) = Yt - 2Yt-1 + Yt-2

Here, yt is the differenced value that is calculated to make the data stationary.

Step 3:- Filter out a validation sample:-This will be used to validate how accurate our
model is. Use train test validation split to achieve this.

Step 4:-  Select AR and MA terms:-Use the ACF and PACF to decide whether to include an
AR term(s), MA term(s), or both.

Step 5:- Build a model:Build the model to fit.

Step 6 — Validate model:- Compare the predicted values to the actual in the validation
sample.

Step 7:- Calculate RMSE or MAPE of prediction to check accuracy.

So,we have to deal with either trend or seasonal. When dealing with seasonal effects, we
make use of the seasonal ARIMA, which is denoted as ARIMA(p,d,q)(P,D,Q)s. Here, (p,
d, q) are the non-seasonal parameters described above, while (P, D, Q) follow the same

21
definition but are applied to the seasonal component of the time series. The term s is the
periodicity of the time series (4 for quarterly periods, 12 for yearly periods, etc.).

Parameter Selection for the ARIMA Time Series Model, looking to fit time series data with
a ARIMA model, our first goal is to find the values of ARIMA(p,d,q) that optimize a metric
of interest. In this section, we will resolve this issue by writing Python code to
programmatically select the optimal parameter values for our ARIMA(p,d,q) time series
model. Along with those parameters we use CLOSING value of the time series stock data
as a feature to predict the future value. Similar, in case of seasonal ARIMA.

We will use a "grid search" to iteratively explore different combinations of parameters. For
each combination of parameters, we fit a new ARIMA model with the SARIMAX()
function from the statsmodels and assess its overall quality. Once we have explored the
entire landscape of parameters, our optimal set of parameters will be the one that yields the
best performance for our criteria of interest. In Statistics and Machine Learning, this
process is known as grid search (or hyperparameter optimization) for model selection.
When evaluating and comparing statistical models fitted with different parameters, each
can be ranked against one another based on how well it fits the data or its ability to
accurately predict future data points.

We will use MAPE or RMSE error calculation mechanism, which is conveniently returned
with ARIMA models fitted using statsmodels. The MAPE measures how well a model fits
the data while taking into account the overall complexity of the model same in case of
RMSE. A model that fits the data very well while using lots of features will be assigned a
lower MAPE score than a model that uses fewer features to achieve the same goodness-of-
fit. Therefore, we are interested in finding the model that yields the lowest MAPE value or
RMSE. The ARIMA order and seasonal order with lowest MAPE value is used with
SARIMAX model for seasonal case but only ARIMA order for trend case to fit and predict
the future value passing history value together.

Along with the plot for prediction we will plot diagnontics plots to ensure non of the
assumptions made by model are violates.

22
4.1.3 Module Description:
There is various presence of designing tools to create figures and diagrams like entity-
relationship diagram, flow chart, use case diagram and other diagrams. In this project
Microsoft Visio, Professional software was used for diagrammatic design of the proposed
system.

4.1.3.1. Data training and prediction:


Time series data is scraped from merolagani.com and saved in csv format. Scraped data in
csv format is used as input to the ARIMA model. The close value from the csv is passed to
the model for fitting and prediction. Along with close value as input other 3 parameters are
passed to model on which our accuracy depends on.
Those three parameters are regression parameters, integrated parameter and moving
average parameters.During same operation data were inserted to database concurrently.
The basic script is shown below:-

def home(request):
file_path = settings.BASE_DIR + '/files_system/'
df = pd.read_csv(
file_path + settings.FILE_TO_USE[0], parse_dates=['Date'], index_col='Date')
n_df = df[['Close']]
series = pd.Series(n_df.Close, index=n_df.index)
date = n_df.index
X = series.values
company = Company.objects.last()
# inserting closing value to Data table in database
for i in range(len(X)):
_ = Data.objects.create(obs_data=X[i], date=date[i], company=company)
# selecting test and train data
size = int(len(series) * 0.98)
train, test = series[0:size], series[size:len(X)]
predn_date = test.index
history = [x for x in train]
predictions = list()

23
a = list()
# preparing models
for t in range(len(test)):
p_values = d_values = q_values = range(0, 2)
# warnings.filterwarnings("ignore")
# call function evaluate_models to get order that best fit
order = evaluate_models(series.values, p_values, d_values, q_values)
# passing data and order to model
model = SARIMAX(history, order=order)
model_fit = model.fit(disp=0)
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
a.append(obs)

def evaluate_models(dataset, p_values, d_values, q_values):


dataset = dataset.astype('float32')
# generate different combination of order
pdq = list(itertools.product(p_values, d_values, q_values))
err = []
order = []
for param in pdq:
print(param)
try:
mse = evaluate_sarima_model(dataset, param)
err.append(mse)
order.append(param)
except:
continue
min_pos = minimum(err, len(err))
minimun_error = err[min_pos]
24
ord = order[min_pos]
return ord

def error_mape(y_true, y_pred):


y_true, y_pred = np.array(y_true), np.array(y_pred)
dif = (y_true - y_pred)
abs_dif = np.abs(dif / y_true)
mean_abs_dif = np.mean(abs_dif)
err_per = mean_abs_dif * 100
return err_per

def evaluate_sarima_model(X, arima_order):


# prepare training dataset
train_size = int(len(X) * 0.98)
train, test = X[0:train_size], X[train_size:]
history = [x for x in train]
predictions = []
for t in range(len(test)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit(disp=0)
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(test[t])
mape = error_mape(test, predictions)
return mape

4.1.3.2. News module:


The system consist of News portal features to enhance the understanding and letting people
about the current news of the stock market. News of different stock company is displayed
categorically like Hydro, finance etc. It is implemented dynamically for easy accessibility.
The database model for news portal is designed as follows:-

class Category(models.Model):
title = models.CharField(max_length=200)
25
image = models.ImageField(upload_to="category")
def __str__(self):
return self.title

class News(models.Model):
title = models.CharField(max_length=200)
category = models.ForeignKey(
Category, on_delete=models.CASCADE)
image = models.ImageField(upload_to="news")
detail = models.TextField()
date = models.DateTimeField(auto_now_add=True)
author = models.CharField(max_length=200, null=True, blank=True)

def __str__(self):
return self.title
The news posted by admin were presented to users with views script given below:-

class CategoryListView(ListView):
template_name = "minor/news.html"
queryset = Category.objects.all().order_by('-id')
context_object_name = "allcategories"

class NewsDetailView(DetailView):
template_name = "minor/newsdetail.html"
model = News
context_object_name = "newsobject"

4.1.3.3.Admin dashboard:
Dashboard is implemented with the extension feature of django admin model. With the use
of django admin-LTE theme, dashboard is customised. The dashboard contains all the
CRUD features along with user management. Script for admin dashboard customization is
given below:-

26
class CompanyAdmin(admin.ModelAdmin):
model = Company
list_display = ['company_name', 'email', 'symbol']
search_fields = ('company_name', 'email', 'symbol',)

admin.site.register(Company, CompanyAdmin)

class ResultAdmin(admin.ModelAdmin):
model = Result
list_display = ['obs', 'pre', 'date', 'company']
search_fields = ('obs', 'pre', 'date', 'company',)
admin.site.register(Result, ResultAdmin)

class DataAdmin(admin.ModelAdmin):
model = Data
list_display = ['obs_data', 'date', 'company']
search_fields = ('obs_data', 'date', 'company',)
admin.site.register(Data, DataAdmin)

4.1.3.4. Company Visualization:


The closing values time series data and predicted result were visualized in web using java
script highcharts. Along with visualization the company details like name,symbol, average
value, today value and future approximation were listed in table.Views script to access data
from database and passing to template is given as:-
def test(request, pk):
comp = Company.objects.get(id=pk)
name = comp.company_name
symbol = comp.symbol
email = comp.email
data = Data.objects.filter(company=pk)
data = read_frame(data)
data['date'] = pd.to_datetime(data['date']).astype('str')
27
_date = data['date'].values.tolist()
_obs = data['obs_data'].values.tolist()
today = _obs[-1]
a = _date[-1000:]
b = _obs[-1000:]
min_obs_index = minimum(_obs, len(_obs))
max_obs_index = maximum(_obs, len(_obs))
min_obs = _obs[min_obs_index]
min_date = _date[min_obs_index]
max_obs = _obs[max_obs_index]
max_date = _date[max_obs_index]
avg = average(_obs)
date_obs = lambda a, b: [
list(date_obs) for date_obs in zip(a, b)]
_date_obs = date_obs(a, b)
series1 = [{
"data": _date_obs,
}]
result = Result.objects.filter(company=pk)
result = read_frame(result)
result['date'] = pd.to_datetime(result['date']).astype('str')
_date = result['date'].values.tolist()
_pre = result['pre'].values.tolist()
_test = result['obs'].values.tolist()
date_pre = lambda _date, _pre: [
list(date_pre) for date_pre in zip(_date, _pre)]
_date_pre = date_pre(_date, _pre)
date_test = lambda _date, _test: [
list(date_test) for date_test in zip(_date, _test)]
_date_test = date_test(_date, _test)
series2 = [{
"data": _date_pre,
}]
series4 = [{
28
"data": _date_test,
}]
future_prediction = FuturePrediction.objects.filter(company=pk)
future_prediction = read_frame(future_prediction)
future_prediction['date'] = pd.to_datetime(
future_prediction['date']).astype('str')
_date = future_prediction['date'].values.tolist()
_forcast = future_prediction['for_data'].values.tolist()
tomorrow = _forcast[1]
date_forcast = lambda _date, _forcast: [
list(date_forcast) for date_forcast in zip(_date, _forcast)]
_date_forcast = date_forcast(_date, _forcast)
return render(request, 'minor/test2.html',
{"series1": series1,
"series2": series2,
"series4": series4,
'min_obs': min_obs,
'min_date': min_date,
'max_obs': max_obs,
'max_date': max_date,
'avg': avg,
"name": name,
'symbol': symbol,
'email': email,
'today': today,
'tomorrow': tomorrow,
})

4.1.3.5. StockBot :
Stock Bot is AIML based stock related chatbot that facilitate chatting for users. It is
implemented using AIML in flask mini framework.The ruled based chatbot consist of
markup language with basic tags like <aiml>, <patterns>,<category>,<template>.It

29
involves creating standard startup files, creating AILM files and including response in
AIML files. The basic script is given below:-
<?xml version="1.0" encoding="ISO-8859-1"?>
<aiml version="1.0">
<meta name="author" content="Dr. Deven"/>
<meta name="language" content="en"/>
<category>
<pattern>WHAT IS YOUR NAME</pattern>
<template>
I am stockbot and you?<think><set name="it"><set name="topic">STOCK
EXCHANGE</set></set></think>
</template>
</category>
<category>
<pattern>WHO INVENT YOU?</pattern>
<template>
I am invented by Programmer Deven<think><set name="it"><set
name="topic">INVENT</set></set></think>
</template>
</category>
</aiml>
Startup file script is like:-
<aiml version="1.0">
<!-- This category works with the standard AIML Set -->
<category>
<pattern>LOAD AIML B</pattern>
<template>
<!-- Load standard AIML set -->
<learn>aiml/stock..aiml</learn>
</aiml>

4.1.4 Implementation Tools


The tool implemented for the programming logic of the system is Python. Front end of the
system is developed by using HTML CSS and JS.

30
Back end of the system is developed with python using DJANGO web development
framework along with

4.1.5 Other Tools and Platforms


GitLab
GitLab is a web based version control system used for collaborative.
Django
Django is a high-level python web framework that encourages rapid development and clean
design. It's free and open source. Our system is based in this framework.
Sublime Text
Sublime Text is a super fast and feature packed text and development editor. It is a
proprietary cross-platform source code editor with a Python application programming
interface.
AIML
AIML stands for Artificial Intelligence Modelling Language. AIML is an XML based
markup language meant to create artificial intelligent applications. AIML is used to create
the chatbot in the system.
Flask
Flask is a lightweight web application framework. It is designed to make getting started
quick and easy, with the ability to scale up to complex applications. It is used to create
chatbot in this system.
Scrapy
Scrapy is a Python library for pulling data out of HTML and XML files. It works with your
favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse
tree. It commonly saves programmers hours or days of work. It help to scrap the data .
Pandas
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use
data structures and data analysis tools for the Python programming language. Python has
long been great for data munging and preparation, but less so for data analysis and
modeling. Pandas help fill this gap, enabling you to carry out your entire data analysis
workflow in Python
High Charts

31
High charts are a SVG-based, multi-platform charting library that has been actively
developed since 2009. It makes it easy to add interactive, mobile-optimized charts to your
web and mobile projects. It features robust documentation, advanced responsiveness and
industry-leading accessibility support.

4.2 Testing

4.2.1 Test Case


Table : Test Case of the Stock Market Analysis and Prediction

Test Case Description Precondition Test Steps Expected


Name Results
TC_01: The User 1 . Navigate to
Sign A user should should have an signup page. An
Up be able to email 2. Fill all the authentication
register into required from link should
system by fields along with appear in the
providing the email, user’s email
authentic username and inbox.
information. password .
Registeration Click ‘Submit’
authentication button
link is sent to
the user’s
email.
Any The username 1 . Navigate to Expected
TC_02: unauthorized and password login page. Result : A
Login and available 2. Enter the message should
Failure unauthentic unauthenticated be generated
user must not username and that notifies
be able to login password login attempt
into system 3. Press ‘Login’ failure .
button

32
TC_03: Any Authentic The user 1 . Navigate to A message of
Login and authorized should be login page . successful
Success user should be registered . 2 . Enter the login and
able to login to authentic dashboard for
the system by username and the user should
providing their password . be displayed .
username and 3 . Press ’Login’
password . button .
TC_04: Analysis is Historical data 1 . Historical 1 . Above 90%
Analysis done with the should be data should be of accurate
and test data i.e the present . divided into two prediction is
prediction historical data part. obtained .
is divided into 2 . Test data are
two part train analyze using
and test data. the ARIMA
And future model .
prediction is 3 . Error is
generated . generated using
the test data and
predicted data.

4.2.2 Test Scripts


from django.test import TestCase, Client, LiveServerTestCase
from django.utils import timezone
from django.contrib.auth.models import User
class UserTest(LiveServerTestCase):
def setUp(self):
self.client = Client()
def test_login(self):
# Get login page
response = self.client.get('/admin/')
#Check response code self.assertEquals(response.status_code, 200)

33
#Check 'Log in' in response self.assertTrue('Log in' in response.content)
#Log the user in
self.client.login(username='XXX', password="XXX")
# Check response code
response = self.client.get('/admin/')
self.assertEquals(response.status_code, 200)
#Check 'Log out' in response self.assertTrue('Log out' in response.content)

def test_logout(self):
# Log in
self.client.login(username='XXX', password="XXX")
# Check response code
response = self.client.get('/admin/')
self.assertEquals(response.status_code, 200)
#Check 'Log out' in response self.assertTrue('Log out' in response.content)

#Log out
self.client.logout()
# Check response code
response = self.client.get('/admin/')
self.assertEquals(response.status_code,200)

34
CHAPTER 5 CONCLUSION & FUTURE ENHANCEMENTS
5.1 Conclusion
The stock analysis itself is a cumbersome task to undertake. By using the comprehension
of both algorithms, a sustainable prediction level has been achieved. Successfully scraping,
then cleaning and then storing the data, our system is able to predict the future values of
the stocks.

The final system is a web based application, which is able to visualize the historic time
series data and future prediction, along with news and chat bot features. The web based
application in DJANGO, with the implementation of database and visualization tools is
able to show the interactive plots of the scores. Finally we were able to achieve our
objectives through the build system. System can predict the value of company stock
according to the data provided to the system to train it. We can analyze the current state of
the current market. Simple interface and interactive charts of the system has made easy
analysis of stock for the system users. Time series stat model ARIMA has been
implemented & achieved high accuracy rate. Our system is able to predict all the company
stock values taking the closing value only. Besides reaching our main objective to predict
the value we are able to add different features to our system. We have managed to add the
news features to the system where users are given access to view different stock news.
Features of chatbot is also added where user can interact with chatbot to get info about the
stock market.

Although we have reached our objectives but we are not fully able to get the accuracy
completely. We are able to achieve accuracy upto 95% maximum and 90% minimum. We
will be adding other feature in future to increase accuracy.

5.2 Future Enhancement

The proposed system is to be developed with inclusion of more companies in the future
along with multiple news sources.

35
The current system is build using the Auto regressive integrated model to increase the
accuracy, different combination of ARIMA order were generated. By selecting best
ARIMA order we are able to obtain accuracy up to 90% or higher. A system is never fully
completed as we can enhance the system in future using different methods.
Some of the future enhancement that can be done to the system are:
1. We can predict the stock value based on additional parameters such as opening values,
turnover etc.
2. We can add different additional features like alerting the user about price rise/fall of
different company’s stock.
3. We can further integrate different algorithm to enhance the accuracy of the system.

36
REFERENCES
[1]James,Chain.(August 5,2018). StockAnalysis
https://www.investopedia.com/terms/s/stock-analysis.asp

[2] Nicola W Burton (January 2018).Random walk down wall street. Retrived from
https://www.researchgate.net/publication/325247657_Burton_G_Malkiel's_A_random_w
alk_down_wall_street.

[3] Singh, Aishwarya.(October 25, 2018). Stock Prices Prediction Using Machine Learning
and Deep Learning Techniques. Retrieved from
https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price machine-
learningnd-deep-learning-techniques-python/

[4] Marijana Zekic.(Unknown). Neural Network application in stock market predictions-A


methodology analysis. https://www.semanticscholar.org/paper/Neural-Network-
Applications-in-Stock-Market-
Zekic/9da27fa48f23766009a35dc83b6bea5901562c2f?navId=citing-papers

[5] KangZhang (6 February 2019). Stock Market Prediction Based on Generative


Adversarial
Network.https://www.sciencedirect.com/science/article/pii/S1877050919302789

[6] Josef Perktold, Skipper Seabold, Jonathan Taylor(2009-2019).Time series analysis tsa.
http://www.statsmodels.org/dev/tsa.html

37

You might also like