Technical Seminar Report
Technical Seminar Report
Bachelor of Engineering
In
Computer Science & Engineering
Submitted by
(1SP20CS028) LIPIKA
2023-2024
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama”, Belagavi-590018, Karnataka
Certificate
This is to certify that Technical Seminar work entitled “Prediction delay in airlines” carried out by
Ms. Lipika, bearing USN 1SP20CS028, a bonafide student of VIII semester B.E for the partial
fulfillment of the requirements for the Bachelor’s Degree in Computer Science & Engineering of the
VISVESVARAYA TECHNOLOGY UNIVERSITY, during the year 2023-2024. It is certified that
all correction/suggestion indicated for Internal Assessment have been incorporated in the report
deposited in the department library. The Technical Seminar report has been approved as it satisfies
the academic requirements in respect of the Technical Seminar work prescribed for the said degree.
Firstly, I thank the Management late Shri. A KRISHNAPPA, Chairman of SEA College of
Engineering and Technology for providing necessary infrastructure and creating a good environment.
I am thankful to our principal Dr. B.VENKATA NARAYANA, who is responsible for creating such
a pleasant environment and appreciating my talent in both academic and extracurricular activities.
Lastly, I thank my parents, friends, lecturer and staff who provided me the much- needed moral
support while pursuing this project.
LIPIKA
(1SP20CS028)
CONTENTS
1 Introduction 1
1.1 Machine Learning
1.2 Logistic Regression Algorithm
1.3 Decision Tree Algorithm
1.4 Random Forest Algorithm
2 Literature Survey 8
3 Problem Statement 11
3.1 Existing System
3.1.1 Disadvantages
4 Development Process 12
4.1 Requirement Analysis
4.2 Resource Requirements
4.3 System Design
4.4 System Architecture
4.5 Module Description
21
5 System Study
5.1 Feasibility Study
5.1.1 Economic Feasibility
5.1.2 Technology Feasibility
5.1.3 Social Feasibility
6 Testing 23
6.1 Types of Tests
6.1.1 Unit Testing
6.1.2 Integration Testing
6.1.3 Function Test
6.1.4 System Test
Result 29
INTRODUCTION
Flight delay is studied vigorously in various research in recent years. The growing
demand for air travel has led to an increase in flight delays. According to the Federal
Aviation Administration (FAA), the aviation industry loses more than $3 billion in a year
due to flight delays and, as per BTS, in 2016 there were 860,646 arrival delays. The reasons
for the delay of commercial scheduled flights are air traffic congestion, passengers
increasing per year, maintenance and safety problems, adverse weather conditions, the late
arrival of plane to be used for next flight. In the United States, the FAA believes that a flight
is delayed when the scheduled and actual arrival times differs by more than 15 minutes.
Since it becomes a serious problem in the United States, analysis and prediction of flight
delays are being studied to reduce large costs.
output.Machine learning has changed our way of thinking about the problem. The below
block diagram explains the working of Machine Learning algorithm:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1. Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing a
sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
2. Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision. The training is provided to the machine with the set of data that has not been
labeled,classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data.
It can be further classifieds into two categories of algorithms:
a. Clustering
b. Association
Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems. In Logistic regression, instead of
fitting a regression line.
we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1). The
curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets. Logistic
Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification. The below image
is showing the logistic function
• In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value tends
to 1, and a value below the threshold values tends to 0.
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
c. Decision Trees usually mimic human thinking ability while making a decision,
so it is easy to understand.
d. The logic behind the decision tree can be easily understood because it shows a
tree-like structure.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root
node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next
node.For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
DEPT. OF CSE, SEACET 2023-24 Page 6
Prediction Delay In Airlines
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
Below are some points that explain why we should use the Random Forest algorithm:
The Working process can be explained in the below steps and diagram: