0% found this document useful (0 votes)
24 views25 pages

A Review of Various Datasets For Machine Learning Algorithm-Based

This paper reviews various datasets used for machine learning-based Intrusion Detection Systems (IDS), highlighting their advances and challenges. It discusses the importance of IDS in protecting computer networks from security threats and evaluates the effectiveness of different machine learning classifiers on datasets like KDDCUP'99, NSL-KDD, and others. The study emphasizes the need for improved datasets and algorithms to enhance detection accuracy and reduce false alarms in IDS.

Uploaded by

Blatere Ngapna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views25 pages

A Review of Various Datasets For Machine Learning Algorithm-Based

This paper reviews various datasets used for machine learning-based Intrusion Detection Systems (IDS), highlighting their advances and challenges. It discusses the importance of IDS in protecting computer networks from security threats and evaluates the effectiveness of different machine learning classifiers on datasets like KDDCUP'99, NSL-KDD, and others. The study emphasizes the need for improved datasets and algorithms to enhance detection accuracy and reduce false alarms in IDS.

Uploaded by

Blatere Ngapna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

International Journal of

INTELLIGENT SYSTEMS AND APPLICATIONS IN


ENGINEERING
ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper

A Review of Various Datasets for Machine Learning Algorithm-Based


Intrusion Detection System: Advances and Challenges
Sudhanshu Sekhar Tripathy1, Dr. Bichitrananda Behera2
Submitted: 15/03/2024 Accepted : 29/04/2024 Accepted: 06/05/2024

Abstract: IDS aims to protect computer networks from security threats by detecting, notifying, and taking appropriate action to prevent
illegal access and protect confidential information. As the globe becomes increasingly dependent on technology and automated processes,
ensuring secured systems, applications, and networks has become one of the most significant problems of this era. The global web and
digital technology have significantly accelerated the evolution of the modern world, necessitating the use of telecommunications and data
transfer platforms. Researchers are enhancing the effectiveness of IDS by incorporating popular datasets into machine learning algorithms.
IDS, equipped with machine learning classifiers, enhances security attack detection accuracy by identifying normal or abnormal network
traffic. This paper explores the methods of capturing and reviewing intrusion detection systems (IDS) and evaluates the challenges existing
datasets face. A deluge of research on machine learning (ML) and deep learning (DL) architecture-based intrusion detection techniques
have been conducted in the past ten years on a variety of cyber security-based datasets, including KDDCUP'99, NSL-KDD, UNSW-NB15,
CICIDS-2017, and CSE-CIC-IDS2018. We conducted a literature review and presented an in-depth analysis of various intrusion detection
methods that use SVM, KNN, DT, LR, NB, RF, XGBOOST, Adaboost, and ANN. We have given an overview of each technique,
explaining the function of the classifier mentioned above and all other algorithms used in the research. Additionally, a comprehensive
analysis of each method has been provided in tabular form, emphasizing the dataset utilized, classifiers employed, assaults detected, an
accurate evaluation matrix, and conclusions drawn from every technique investigated. This article provides a comprehensive overview of
recent research on developing a reliable IDS using five distinct datasets for future research. This investigation was carefully analyzed and
contrasted with the findings from numerous investigations.

Keywords: Intrusion Detection System, ML classifiers, Different IDS datasets, Evaluation matrix with accuracy, Detected assaults

important and pressing problems of our day. Escalating and further


1. Introduction powerful digital attacks, crimes, and hacking resulted from our
increasing reliance on digital infrastructure and software
The rapid growth of the information technology field in the last 10 applications.
years has made creating reliable computer networks a crucial task
for IT managers. However, this task is challenging due to the
Numerous security solutions have been extensively explored and
numerous threats that can compromise the confidentiality,
implemented throughout the years to defend against them,
integrity, and availability of these networks, making them
including firewalls, intrusion detection systems, cryptography, and
vulnerable to various risks. [1]. The Internet is a crucial tool in encryption and decryption approaches. Due to its capacity to
everyday life, used in commerce, education, medical sector, detect, track, and prevent intrusions by exploiting already present
entertainment, and different fields. As technology advances, it concepts and trends, intrusion detection [2] is regarded as the initial
becomes more common to use networks in various aspects of life.
stage of protection against complicated and dynamic invasions [3].
However, an attack on the network poses a risk due to its
popularity.
Intrusion is the process of getting illegitimate entry to networks or
services by tampering with the infrastructure and rendering it
IDS is a component of computer software that analyses an entire vulnerable. Information security is comprised of three core
infrastructure or network of things for fraudulent behavior or principles that include confidentiality, integrity, and availability.
adhering to restrictions. People now depend drastically on web
Integrity ensures data remains accurate and unaltered, while
access for practically all facets of our daily existence, as the web
availability ensures that data are accessible to authorized users and
has completely transformed communication and our way of life.
confidentiality ensures to restrict unauthorized access and sharing
As a result, online privacy has emerged as one of the most personal data. Intrusion detection systems identify intruders but are
susceptible to false alarms. Organizations must adjust IDS
1 C V Raman Global University, Bhubaneswar–752054,
products post-implementation to prevent false alarms [4].
Odisha ORCID ID : 0009-0003-5567-458X
2 C V Raman Global University, Bhubaneswar–752054, This review of literature examines various IDS computational
Odisha ORCID ID : 0000-0002-9362-7691 algorithms, including Support Vector Machine (SVM), K Nearest
*Corresponding Author Email: Neighbour (KNN), Decision Tree Classifier (DT), Logistic
[email protected] Regression (LR), Naive Bayes Classifier (NB), Random Forest
Classifier (RF), Extreme Gradient Boosting Classifier

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3833
(XGBOOST), Adaboost Classifier (Adaptive Boosting), Artificial Signature-based intrusion detection: Systems use a database of
Neural Network (ANN), and Deep Neural Network (DNN) tests known attack signatures to identify attacks. They track network
their performance on five different datasets. packets and generate alarms if they match. However, they can only
detect registered intrusions and cannot identify new ones.
1.1. Intrusion Detection System (IDS)
An intrusion implies that an individual operates on a system's Anomaly-based intrusion detection systems, on the other hand,
service erroneously or without authorization. The main objective search for unidentified threats. It monitors network traffic and
of intrusion is to compromise a resource's availability, alerts system administrators of unusual behaviour, as depicted in
confidentiality, and integrity. In reality, a malicious individual Figures 3 and 4.
strives to gain access to data without authority and triggers damage
to any illicit activity that might be operating.

Intrusion Detection Systems (IDS) are surveillance systems that


monitor computer systems and network activity, detecting illicit
activity and alerting network administrators to protect data. They
enhance network privacy and data protection for enterprises,
ensuring firewall security and preventing hackers from
compromising secure connectivity. IDS also improves computer
networks by monitoring various types of attacks.

An intrusion detection system (IDS) is a network infrastructure


device that detects and alerts users of suspicious web traffic, Figure 3: Signature-Based IDS
potentially restricting traffic from dubious IP addresses, and is
installed to detect anomalous activity, as depicted in Figure 1.

Figure 1: The Installation of IDS in a network Figure 4: Anomaly-Based IDS


infrastructure
IDS sends alerts to hosts or network administrators about
Cybersecurity is crucial due to the internet's influence, with malicious behavior connected to a non-actively installed network
antivirus software, firewalls, and intrusion detection systems being switch using port mirroring technology. It monitors traffic, detects
key tools. The expansion of computer networks and app usage have intrusions, and can be installed between network switches and
made attacks common. HIDS and NIDS are two types of intrusion firewalls, as depicted in Figure 5.
detection systems. HIDS monitors the activities on individual hosts
or devices, as depicted in Figure 2, notifying administrators of Intrusion detection involves real-time monitoring and analysis of
suspicious activity. NIDS analyses all traffic, employing methods data and networks for potential vulnerabilities and active attacks.
like packet sniffing, to detect unauthorized access. Signature Machine learning methods are being used for intrusion detection,
detection, also known as misuse detection, relies on known pattern but their ability to filter out false alarms is a major weakness. So,
of attacks, and anomaly detection is a method used in NIDS, to reduce false alarms in ML-based intrusion detection, we must
identifying deviations from normal behavior. have quality datasets and advanced algorithms and regularly
evaluate their performance.

IDS is great at spotting network attacks, but it has a big problem


with a lot of false positive alerts. This can cause problems for cyber
security analysts and lower the effectiveness of the system. IDSs
produce fewer false positives than anomaly-based IDSs, however
usage-based IDSs nonetheless frequently result in reduced
detection system output because of false alarm rates. To minimise
false positives, researchers are examining techniques such as
machine learning, deep learning, control charts, and intelligent
false alarm filters.
Figure 2: A Pictorial representation of NIDS Vs HIDS
This paper analyses relevant literature to review the performance

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3834
of current datasets from network security methods in defending training to prevent false alarms, but human oversight and
computer systems from cyber-attacks, emphasizing the need for continuous learning are necessary for system accuracy and
new approaches and enhanced IDS technologies. effectiveness over time.

1.3. The core structure of intrusion detection system (IDS)


Detecting intrusions is a strategic method that monitors network or
digital setup operations, alerting potential hazards. The intrusion
detection approach utilizes computational and smart techniques to
identify potential intrusions through pattern identification. The
intrusion detection system, a smart computer, automates the
process by executing an identification model and collaborating
with other IDS components. Figure 7. symbolizes the
comprehensive structure of an intrusion detection system that
facilitates inter-IDS component communication. The core
structure of IDS includes the classification type known as the
collaborative behavior of an IDS. Its foundation is each IDS's
interaction and monitoring module.

Figure 5: A snapshot of the Intrusion Detection System


1.2. Conceptual layout of the system for intrusion detection
Figure 6. depicts the intrusion detection system layout. Its principal
components are as follows:
• Pre-processing phase: In this phase, packet sniffing tools like
Wireshark and Capsa extract features from each packet, dividing
them based on source and destination addresses. However, these
tools do not determine if the packet is normal or intrusive.

• Classification: The classification phase uses data from the Figure 7: The core structure of an intrusion
previous phase to determine if a packet is an attack or normal, with detection system (IDS)
algorithms categorizing it into similar groups based on feature
values. 1.4. Challenges of IDS
(i) Training data Intrusion detection systems are security devices that monitor
(ii) Testing data network activity and computer networks to identify suspicious
activity and system misuse [5]. IDSs have recently become
recognized as one of the core safety mechanisms that enterprises
need to implement in their IT infrastructures. When deployed
with other security services, IDSs can form a layered security
framework. For instance, a lot of individuals integrate IDS with
antivirus and firewall software. IDSs can be employed in such a
way to detect assaults that conventional security products are
unable to identify.
When implementing IDS mechanisms, these particular features
are crucial for an effective assault solution:

• System robustness and predictability.


• Speedy identification.
• Low false positives.
Figure 6: Logical layout of the system for IDS
• Optimal identification accuracy.
• The training phase provides response class and packet attributes • Reduced hardware and software prerequisites.
for mapping domain selection rules, which cannot be modified or • Accurate intrusion location detection.
replaced based on new training data during operational use, • Compatibility with other modern technology.
although it can be updated periodically for improved performance.
In conclusion, to identify assaults with outstanding accuracy and
• During testing, the system uses untrained data to sample but it promptness, an intrusion detection system (IDS) needs to include
does not determine true answers without specifying the answer all of the features mentioned above.
class using input packets. The test dataset contains genuine
answers, which are compared to the model's predictions to assess 2. Related Work to Current Datasets
accuracy and other performance measures.
This section outlines the models of machine learning utilized in the
study and presents a summary of some of the most recent datasets
• Reducing False Alarms: Machine learning systems require

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3835
in the IDS domain. The IDS field faces challenges in dataset and speed up computing. The KDDCUP'99 dataset, despite its
availability due to privacy and security concerns. Cybersecurity diverse applications, suffers from poor performance, lengthy
researchers have created numerous high-quality datasets for training, and subpar intrusion detection methods due to its high
anomaly-based IDS research, which are introduced in this section. skewness, high duplicate records, and class inequalities [7].
Table 1. Displays the number of cases per attack category within
In this part, some of the KDDCUP'99, NSL-KDD, CICIDS-2017, the KDDCUP’99 datasets
UNSW-NB15, and CSE-CIC-IDS2018 datasets are briefly
Sets Network Authentic Unique
described, along with some of key features and traits. traffics records records
Attacks 3,925,650 262,178

Trainin
g Set
2.1. The KDDCUP’99 dataset Normal 972,781 812,814

The KDDCUP'99 data set is a frequently used dataset for Total 4,898,431 1,074,992
developing IDS-assessing anomaly detection techniques. In Attacks 246,150 29,378

Testing
addition to the standard intrusion category (networks without any Normal 60,591 47,911

Set
intrusions), attacks are classified into one of four types as shown
Total 306,741 77,289
in Figure 8 [6].

• Denial of Service (DoS) attacks: Effectively interpreted as 2.1.1. KDD CUP 1999 dataset
flooding a network leading to service interruptions (e.g., Syn This serves as the first dataset currently being investigated as
Flood). depicted in Figure 9.
• Remote-to-Local (R2L) attacks: Accessing a remote computer
without authorization. (e.g., guessing a password) • Pre-processor: Set up an approach for feature selection to
• Probe attacks: Networks breached for surveillance. (e.g., port eliminate anomalies and minimize the entropy of data.
scanning) • Hyper-parameters optimization: The training dataset is
• User-to-Remote (U2R) attacks: A hacker tries to log into a transformed into higher dimensions using the kernel function,
regular user account. (e.g., various “buffer overflow” attacks) ensuring linear separability by focusing on the most important
hyper-parameters, like gamma. Parameter optimization is a
critical aspect of improving the performance of IDS.
• ML Classifiers: The input dataset is categorized into various
groups to facilitate efficient intrusion detection.
• Training Phase: The training program is trained to differentiate
between legitimate and malicious data.
• Testing Phase: The suggested system is evaluated using a
separate test dataset.
Figure 8: Four different types of assault categorization • Evaluate Optimum Accuracy and Swift: The categorization
system's accuracy and processing speed will also be assessed.
2.2. The NSL-KDD dataset
The NSL-KDD dataset was introduced in 2009 to address issues
with the KDDCUP'99 dataset. It is smaller, easier to maintain, and
has minimal model bias due to the skewness of records. The
number of sampled records is inversely proportional to the original
dataset as shown in Table 2. The NSL-KDD dataset inherited most
of its parent dataset's inherencies, categorizing data as normal or
attack. It has 41 characteristics categorized as basic, content, or
network traffic features, with 30 traffic classes in the testing set
Figure 9: The architecture of the KDD CUP'99 assault dataset and 23 in the training set [8].

The dataset used for the Third International Knowledge Discovery


2.2.1. Assaults that are incorporated in the Dataset produced
and Data Mining Tools competition aimed to develop a network
by the NSL-KDD
intrusion model categorizing intrusions as good, bad, or normal. It
consisted of 4,898,430 samples with 41 features as shown in Table The following section provides summaries of the four distinct
1. The initial data set included 22 attacks, categorized into four categories that the dataset addresses.
types: DOS, R2L, U2R, and probing. All attacks are classified as
abnormal, requiring intrusion detection to determine the normal or • Denial of service (DoS): This kind of attack causes a server to
abnormal mode. KDDCUP99's data set is a heterogeneous overload with malicious requests, causing it to refuse to serve
collection of 41 features with textual and numerical values, genuine traffic. (eg., SYN Flood)
necessitating data normalization. Text attribute values are often • User-to-Root Attack (U2R): This is an assault where an attacker
replaced with numerical values, some of which have a large range exploits a user account to gain administrative or master
and others with only the numbers 0 and 1, such as determining privileges. (eg., Buffer Overflow Attack)
normality or assault targets in records. Then Split records into • Remote to Local Attack (R2L): This attack involves an attacker
normal and abnormal cases, setting the normal record's property transferring data via a network and deceivingly gaining local
value to +1 and attack records to -1, scale data collection properly, access to a machine to execute an exploit. (eg., Password

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3836
Guessing) based on efficiency as depicted in Figure 11.

• Probing: This type of attack scans a network to identify its


specifics and weaknesses, allowing attackers to The NSL-KDD
dataset exploit these vulnerabilities for further attacks. ( eg., Port
Scanning)

Table 2. Displays the number of cases per attack category within


the NSL-KDD datasets

The overall number


Data Set
Format Records Normal DoS Probe U2R R2L
Class Class Class Class Class
KDD 25912 13449 9234 2289 11 209
Train+
20%
KDD 125973 67343 45927 11656 52 995
Train+
KDD 22544 9711 7458 2421 200 2754 Figure 11: The suggested architecture of the CICIDS-2017
Test+ assault dataset

The research is categorized into topics based on training and 2.3.1. CICIDS-2017 dataset specifications
testing data, initial pre-processing, ML methods, and model
outcomes. The data is initially pre-processed, normalized, and For a total of five days, data was gathered. the event ran from
features selected as depicted in Figure 10. Pre-processing involves Monday, July 3, 2017, at 9 a.m. to Friday, July 7, 2017, at 5 p.m.
converting characteristics to appropriate formats, feeding machine Several assaults were conducted during this time. The usual daily
learning algorithms, creating and training the model design, testing traffic is present on Monday. On Tuesday, Wednesday, Thursday,
against test data, and achieving different anomaly detection rates and Friday, respectively, the attacks took place in the morning and
using various algorithms. Performance is then compared with other the afternoon. Brute Force FTP, Brute Force SSH, DoS,
models for final classification. Heartbleed, Web Attack, Infiltration, Botnet, and DDoS are some
of the attacks that have been used, as shown in Table 3.

Table 3. Displays the total number of Flows and Assaults within


CICIDS-2017 datasets

Day Total No. of Assaults Type


Flows Assaults
Monday 529918 0 Normal network
activities
Tuesday 445909 7938 FTP-Patator
5897 SSH-Patator
Wednesday 692703 5796 DoS Slowloris
5499 Dos Slowhttptest
231073 DoS Hulk
10293 Dos GoldenEye
11 Heartbleed
Figure 10: A scenario of an extensive investigative workflow Thursday 170366 1507 Web attack-Brute
employing the NSL-KDD data set for the detection of intrusions. Morning Force
652 Web attack -XSS
2.3. The CICIDS-2017 dataset
21 Web attack-SQL
The CIC-IDS2017 dataset [9], created by the Canadian Institute for Injection
Cybersecurity in 2017, includes various operating systems, Thursday 288602 36 Infiltration
Afternoon
protocols, and attack types. It simulates 25 user actions and 2016's Friday Morning 191033 1966 Botnet
most frequent attacks, including port scans, infiltrations, Friday Afternoon 286467 158930 Port Scan
Heartbleed attacks, botnet attacks, DoS attacks, and DDoS attacks. Friday Afternoon 225745 128027 DDoS
The dataset was publicly accessible as a CSV file on the University 2
of New Brunswick's website [10]. However, it has a high-class Total 2830743 557646 19.70%
imbalance problem [11], with over 70% of traffic being benign.

The CICIDS-2017 raw dataset undergoes pre-processing activities,


including data creation, feature selection, and standardization. 30%
is used for testing, while 70% is used for training. The trained IDS
is then evaluated using the testing dataset, with outcomes evaluated

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3837
2.4. The UNSW-NB15 dataset Table 4. Testing and Training for set data distribution
The accessible UNSW-NB15 dataset has ten classes: Normal,
Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic,
Reconnaissance, Shellcode, and Worms, with 42 features (not Category Testing Set Training Set
Normal 37000 56000
including the labels). Its testing set has 82,332 records, while the Analysis 677 2000
training set contains 175,341. The UNSW-NB15 training and Backdoor 583 1746
testing sets' classes are unbalanced as well. This section details DOS 4089 12264
the configuration of the synthetic environment and the Exploits 11132 33393
production of UNSW-NB15, focusing on the testbed Fuzzing 6062 18184
Generic 18871 40000
configuration and the entire process of producing UNSW-NB15
Reconnaissance 3496 10491
[12]. Shellcode 378 1133
Worms 44 130
Figure 12. depicts the system's conceptual layout as it is Total Records 82332 175341
suggested in the present research. The processing of data covers Table 5. Attack categories of the UNSW-NB15
the entire first phase. Information engineering is the term
employed to describe the procedure most frequently. Traffic Type Overview
Normalization, feature selection, and data cleaning are the three Normal A threat-free flow of traffic
stages of information processing. Using the set of training data, Fuzzing By randomly inserting
the model undergoes training after the appropriate feature or set numerous data variants into a
target program until one of
of attributes has been chosen. Next, the set of validation results these variations, uncovers a
is employed to confirm the accuracy of the model that was vulnerability, the approach
trained. Furthermore, the validated model is verified using the automatically identifies
experimental results. Consequently, passing through the below- "hackable" software flaws.
Analysis Examples of this broad
described method generates a customized and suitable model and category include spam, port
results in the detection of intrusions. scanning, and HTML file
penetration.
Backdoor Illegitimate software that
circumvents conventional
encryption to grant remote
desktop access to systems
such as records and Dropbox.
DOS Due to an overload of
incorrect authentication
attempts, the network/server
could malfunction or hold
up, preventing authorized
users from using online
services.

Exploits Malware frequently contains


code that exploits software
flaws or security holes to
spread easily and rapidly.
Generic Assault by collision on the
ciphers' encrypted keys.
Figure 12: The assault dataset's recommended architecture for
Complies with all block
the UNSW-NB15 dataset ciphers.
Reconnaissance A set of user-friendly
2.4.1. Overview of the UNSW-NB15 Dataset techniques, such as Nmap, is
used to gain knowledge
By integrating the majority of contemporary reserved assaults, the about a specific internet or
UNSW-NB15 dataset aims to emulate contemporary network system.
settings. Table 5. lists the ten different categories of traffic that are Shellcode A bug introduces statements
included in the dataset: normal, fuzzing, analysis, backdoor, DOS, or instructions into a program
that give it direct access to its
exploits, generic, reconnaissance, and worms. Table 4 provides an
registers and functions.
extensive overview of these categories in terms of the breakdown
of training and testing sets as well as the total number of entries Worms Malicious software that
duplicates itself. Use
per assault. excessive amounts of system
memory and internet
The UNSW-NB15 dataset is a key tool in research for developing connection bandwidth.
and testing machine learning-based intrusion detection systems, Reduces the systems'
aiming to enhance detection accuracy, reduce false positive rates, stability.
and address data imbalance issues.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3838
2.5. The CSE-CIC-IDS2018 Dataset Table 6. Attack categories of the CSE-CIC-IDS2018 dataset
This section explains the types of data that are used to implement discussed
intrusion detection. It provides genuine and rapidly changing
Attribute name Description of
information obtained from the platform of Amazon AWS by the
attribute
Communications Security Corporation (CSE) and Canadian Institute down_up_ratio The split between
for Cybersecurity (CIC) and depicts live traffic over the network downloads as well as
[13]. For analyzing the detection of intrusion methods that utilize uploads.
network abnormalities, it is regarded as one of the most reliable Fl/dur length of outflow.
fw/pkt/avg The median dimension of
sources of input [14]. Recent attacks from 10 categories include
an entirepackage traveling
intrusion, web, Benign, Bot, FTP-Brute Force, SSH-Brute Force, ahead.
DDOS attack-HOIC, DDOS attack-LOIC-UDP, DoS attacks- fw/act/pkt The percentage of
GoldenEye, and DoS attacks-Slow HTTP test [15]. Table 7. provides communication packets
a comprehensive breakdown of each assault class and its original currently downstream
data contents for the
collected volume. The attacked administration has 30 servers, Transmission Control
infrastructure, 420 connections, 5 fields, and an assault architecture Protocol, more commonly
of 50 gadgets [16]. The CICFlowMeter-V3 program was executed to known as TCP, is at least
identify 80 attributes from the provided data [17]. Table 6. presents a single Byte.
fw/pkt/std The protocol packet's
various attributes derived from the web traffic flow in Figure 13, forward to the average
which depicts the IDS methodology applied in the studies. More difference in dimension.
specifically, there are four steps in the method: 1) stages of datasets; tot/bw/pk Total packets of
2) pre-processing; 3) training; and 4) testing. information move in the
reverse way.
tot/fw/pk The overall amount of
The CSE-CIC-IDS2018 dataset is a crucial tool for developing, packets of data delivered
testing, and validating IDS models, particularly those using machine onward.
learning techniques, serving as a benchmark for evaluating detection Pkt/ len/var the shortest time between
accuracy, false positive rates, and generalization capabilities. payload deliveries.
bw/pkt/max Maximum payload
The CSE-CIC-IDS2018 dataset is a realistic benchmark for size/reverse.
evaluating intrusion detection systems, encompassing 80 features bw/pkt/min Minimum payload
size/reverse.
from normal and malicious network traffic, and is widely used in
modern IDS research. fw/win/byt The value of bytes
transmitted in the initial
session or looking
forward.
bw/win/byt The overall amount of
bytes transmitted in the
initial session or reversed.

bw/hdr/len Total bytes employed in


headers or backward.

Fw/hdr/len Total bytes employed in


headers as well as
forwarding.

Table 7. Illustrates a volume and percentage of points of data by


assault classification
Category Assault The volume Ratio based
designation Category of data on the
points in a initial data
category (1252835
Figure 13: The CSE-CIC-IDS2018 attack data rows).
1 Benign 971016 77.505 %
Sets proposed architecture

2 Infiltration 38703 3.089 %

3 DoS attacks- 37323 2.979 %


Hulk

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3839
Watson et al. [23] developed an anomaly-based invasion
4 Bot 137185 10.95 % detection approach using the SVM algorithm to identify malicious
5 DDOS attack- 57507 4.59 % code at the cloud services hypervisor level. The approach achieved
HOIC
6 DDOS attack- 8377 0.669 % a 90% accuracy rate in anomalous identification, suggesting that
LOIC-UDP the SVM approach could be implemented for malware
7 FTP- 2234 0.178 % identification at the lowest computational cost.
BruteForce
8 DoS attacks- 332 0.026 %
In 2019, Kaja, Shaout, et al. [24] developed a two-stage intrusion
GoldenEye
9 DoS attacks- 103 0.008 % detection system (IDS) that uses an unsupervised model called k-
SlowHTTPTest means clustering to identify suspicious behavior, followed by
10 SSH- 55 0.004 % supervised approaches like Naïve Bayes, random forests, and
Bruteforce decision trees to classify harmful behavior. The system achieved
3. Literature Review an accuracy of 92.74% to 99.97% on the KDD99 dataset.

Over the last ten years, many different IDSs have been evaluated, Watson et al. developed [25] an anomaly-based invasion detection
developed, and reviewed by implementing multiple sources of approach using the SVM algorithm to identify malicious code at
information that are publicly accessible. There have been several the cloud services hypervisor level. The approach achieved an
published reviews and comparative research studies that deal with accuracy rate of 90% in anomalous identification using system-
both the ways that machine learning is implemented in developing based properties, suggesting that the SVM approach could be
IDS and the building blocks of IDS for various applications. implemented for malware identification at the lowest feasible
computational cost.
A review of the research published by Verma et al. [18] reveals
that there is a possibility of greater efficiency in anomaly-based Kasongo SM and Sun Y [26] developed five supervised models:
detection of intrusions, specifically concerning the degree of false Artificial Neural Network (ANN), Decision Tree (DT), K-Nearest-
positives. On the dataset provided by the NSL-KDD, learning Neighbour (KNN), Support Vector Machine (SVM), and Logistic
techniques for extreme gradient boosting (XGBoost) and adaptive Regression (LR). They used Extreme Gradient Boosting
boosting (AdaBoost) both with and without clustering methods (XGBoost) to reduce feature vectors from 42 to 19 and tested on
were executed. Even with an accuracy of 84.25%, hybrid or the UNSW-NB 15 dataset. DT models improved accuracy in
ensemble machine learning classifiers still need to be implemented binary classification jobs with fewer features, from 88.13 to
to boost efficiency. 90.85%.

Dutt et al. [19] developed a hybrid detection system for intrusion The study suggests [27] an improved method for detecting
tactics, identifying novel assaults using suspicious activity and intrusion attacks using machine learning algorithms, using the
prevalent assaults using misuse proximity. The anomaly KDDCUP 99 dataset. The J48, J48Graft, and Random Forest
identification strategy improved the algorithm's accuracy to algorithms outperform other methods, with a detection rate of over
92.65%, and as usage increased, false negatives decreased. 96%. WEKA is used to verify the dataset's accuracy, considering
However, the slow detection rate persists on massive scale- parameters like ROC, F-measure, precision, and recall. Future
dimensional data. The daily trained model reduced false negative Improvements could involve artificial intelligence and neural
percentages to 7.35, enhancing true positive rates and decreasing networks to reduce false alarm rates.
false negative rates in the abuse detection system.
Researchers [28] have developed a machine learning-based
Perez D. et al. [20] proposed a hybrid network-based intrusion method for detecting intrusions in computer networks. The
detection system (IDS) using supervised and unsupervised approach uses parameters optimization, feature selection, pre-
machine learning strategies. They integrated artificial neural processing, and classification to identify key attributes like
networks with feature selection and K-means clustering. The study Support-Vector Machines (SVM), Random Tree, AdaBoost, and
found that the best IDS performance is achieved when SVM and K-Nearest Neighbour (KNN). Tested on large datasets like NSL-
K-means are combined with feature selection. To minimize false KDD and CICDDOS2019, the method outperformed other
positives, hybrid technology models need to be developed. algorithms and achieved high detection rates. The study highlights
the importance of DR metric values above 99% for intrusion
Maniriho et al. [21] study on intrusion detection used two datasets, categorization.
NSL-KDD and UNSWNB-15, with a single machine-learning
algorithm and an ensemble approach. The study found superior The study [29] uses five machine learning-based computational
performance with incorrect classification rates of 1.19 percent and models, including Naive Bayes, Decision Tree, K-Nearest
1.62 percent, suggesting future research should focus on improving Neighbour, Random Forest, and Support Vector Machine, along
data size and dimensionality. with two deep learning models, Multilayer Perceptron Model
(MLP) and Long-Short Term Memory (LSTM). The NSL-KDD
Ahmad Iqbal et al. [22] implemented neural networks with dataset achieved accuracy levels of 97.77% with LSTM, 96.89%
feedforwards and pattern identification neural networks for IDS. with MLP, 89.6% with normalization, and 89.2% without
They used scaled conjugate gradient methods and Bayesian normalization. The neural network model outperforms
regularisation. Both models performed well, with the feed-forward conventional models in detection accuracy and incursion detection.
artificial neural network achieving an accuracy of 98.0742%. Future improvements involve reducing the imbalance ratio and
Further evaluation of multiple datasets is needed. average accuracy.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3840
Researchers [30] used machine learning and meta-heuristic The study [36] uses machine learning algorithms to determine data
algorithms to improve intrusion detection performance in the NSL- intrusion rates. Support vector machines (SVMs) and Artificial
KDD dataset. They applied methods like Random Forest, neural networks (ANNs) are used to identify abnormalities or
Classification and Regression Trees, Support Vector Machine, and authorization issues. If malicious material is found, the request is
Multilayer Perceptron to maximize hyper-parameter tuning. discarded. Chi-squared and correlation-based feature selection
Thestudy evaluated the effectiveness using metrics like accuracy, techniques are used to minimize irrelevant data. The models are
precision, recall, F1-score, and recall. The results showed that tested on a pre-processed dataset to improve prediction accuracy.
genetic algorithms achieved 96% accuracy, highlighting the The SVM algorithm achieved 48% accuracy, while the ANN
efficiency of machine learning in cybersecurity and the potential model achieved 97%. Using an ANN significantly improved
of meta-heuristic algorithms in optimizing IDS models. intrusion detection accuracy.

The research [31] introduces a hybrid data optimization-based The research [37] presents a method for creating effective IDS
intrusion detection system called DO IDS, which combines feature using random forest classification and Principal component
selection and data sampling. It removes outliers, optimizes analysis (PCA). The strategy outperforms other methods like
sampling ratios, and selects the best training dataset using the SVM, Naïve Bayes, and Decision Trees in terms of accuracy, with
Random Forest classifier. The system is constructed using the ideal an accuracy rate of 96.78%, an error rate of 0.21%, and a
training dataset and features chosen during feature selection. DO performance time of 3.24 minutes.
IDS outperforms other algorithms in identifying anomalous
behaviors, scoring 92.8% on the UNSW-NB15 intrusion detection Researchers [38] propose ML algorithms for classification,
dataset. including NB, RF, J48, and ZeroR, and apply Kmeans and EM
clustering to the UNSW-NB15 dataset. RF and J48 algorithms
The study [32] evaluates advanced intrusion detection techniques yielded the best results, with 97.59% and 93.78% respectively.
and machine learning methods. It identifies the four most effective
techniques for categorizing attacks: binary, multiclass, k-nearest- Researchers [39] have developed a network intrusion detection
neighbor, and Random Forest. Binary classification has the highest model using the NSL-KDD dataset and compared different
accuracy, with results ranging from 0.9938 to 0.9977. Multiclass machine learning techniques. The model achieved high accuracy
classification outperforms the k-nearest-neighbor method, with a scores of 98.088%, 82.971%, 95.75%, and 81.971% when
score of 0.9983. Random Forest's binary classification achieves the evaluated separately. However, when combined with an inference
highest score of 0.9977. The study also highlights the potential of detection model, the accuracy increased to 98.554%, 66.687%,
machine learning to improve system accuracy and reduce false 97.605%, and 93.914%. The study found that three out of four
negatives. Multiclass classification yields the best results, and machine learning approaches significantly improved performance
distinguishing between assault types can yield more useful results. when combined with the inference detection model.

The research [33] shows that IKPDS (Indexed Partial Distance Researchers [40] use recursive feature elimination to classify
Search K-nearest Neighbor) is a fast KNN algorithm that shortens unnecessary features using the KDD CUP 99 dataset and four
classification completion time while maintaining accuracy and classifier models: Random forest, SVMr, AdaBoost, and LDA.
error rate for various attack types. It achieved 99.6% accuracy on Adaboost offers the highest sensitivity and specificity, with a
12597 cases and real-class labels, indicating that feature selection sensitivity of 99.75% and 95.69%. This system enhances traffic
techniques can increase accuracy and save calculation time for identification accuracy, increases detection rate, and reduces
DoS and probe attacks. The proposed algorithm maintains the computation cost through feature selection. Adaboost's sensitivity
same classification accuracy and takes less computing time than and specificity make it a superior model, with a significantly higher
conventional KNN and PKDS. detection rate.

Researchers [34] have developed a random forest classifier model The research [41] explores feature selection techniques for
for intrusion detection systems, outperforming conventional network traffic data for intrusion detection, focusing on discrete
classifiers. Tested on the NSL-KDD data set, the model showed goal values and continuous input features. A new method is
high detection rates and low false alarm rates. The model identified developed, achieving 99.9% accuracy in distinguishing benign and
four types of attacks and underwent feature selection to reduce DDoS signals. The study aims to improve the understanding of
dimensionality. The proposed approach achieved 99.67% accuracy network traffic data and develop robust detection systems by
without feature selection, outperforming the J48 classifier by addressing the gap between discrete target variables and
99.26%. The model demonstrated a low false alarm rate, high DR, continuous input properties.
and good accuracy.
Researchers [42] found that robust SVM neighbor classification
Researchers [35] have introduced the cluster center and nearest improves detection accuracy in network packet sequences,
neighbor (CANN) technique for feature representation, which adds removes noise, and lowers false alarm rates. The intrusion
two distances between data samples and the cluster center and detection rate can reach 87.3% with a false alarm rate of 0 and
closest neighbor within the same cluster. This one-dimensional 100% with a false alarm rate of 2.8%.
feature allows a K-Nearest Neighbour classifier to detect
intrusions, outperforming 99.76% accuracy or comparable to K- The authors [43] have developed a novel ensemble intrusion
NN and support vector machines on the KDD-Cup 99 dataset. detection system that combines decision trees, random forests,
extra trees, and XGBoost algorithms. The Python-based system
significantly improves detection accuracy using metrics like

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3841
precision, recall, and f1-score, outperforming state-of-the-art that fine-gaussian SVM offers the best accuracy and lowest error.
setups. The system's performance was evaluated using the
CICIDS2017 dataset, and future advancements could enhance The authors of [50] propose a new feature selection algorithm for
intrusion detection and assault efficacy. The ensemble method's Knowledge Discovery and Data (KDD) sets, focusing on
effectiveness is demonstrated through its performance on the dimensionality reduction in fuzzy rough sets using Maximum
CICIDS2017 dataset. Dependence Maximum Significance (MDMS). The algorithm uses
a modified K-Nearest Neighbourhood-based technique to
Researchers [44] have developed a unique Intrusion Detection categorize the data set, improve accuracy, and reduce assaults. The
System (IDS) method using artificial intelligence, specifically algorithm efficiently identifies intrusion types, providing excellent
machine learning, to identify anomalies in computer networks. The attack detection and lower false alarm rates. The modified K-NN
model uses a Support Vector Machine (SVM) classification model classifier outperforms other algorithms due to its rules decision-
with two kernels and the latest UNSWNB-15 dataset for training making flexibility and fuzzy rules from the Gaussian membership
and evaluation. The model achieved a 94% detection rate after six function for decision-making and distance estimation, with an
measures, with 94% and 93% accuracy, respectively. Multi-class accuracy of detection of 98.5%.
categorization is proposed for independent detection of infiltration
types. The research [51] compares various Intrusion Detection System
(IDS) algorithms, including KNN, RF, ANN, CNN, SVM, and a
Researchers [45] have developed an intrusion detection system combination of techniques. The suggested method achieves 96.8%
using machine learning, radial basis function, and multi-layered accuracy. The paper also examines an IDS based on Deep Q
perceptron (MLP) approaches. The model's parameters are Networks (DQN), demonstrating how elegant methods can
optimized using backpropagation learning and the NSL KDD enhance IDS precision and effectiveness.
dataset. Performance metrics like accuracy, sensitivity, specificity,
and confusion matrix are assessed. The study found that PCA-
derived features have lower false alarm rates and higher detection 4. Machine Learning Classifiers for Intrusion
and accuracy rates, while RBFIDS-based intrusion detection Detection System
systems offer greater accuracy than MLP-based systems, making
them effective in real-world scenarios. Intrusion detection systems (IDS) are crucial for maintaining
network security by identifying attacks and unauthorized access.
Researchers [46] used the UNSW-NB15 dataset to train machine Machine learning classifiers significantly enhance the
learning classifiers like K-Nearest Neighbours, Naïve Bayes, effectiveness and precision of IDS. Here several popular ML
Random Forest, SGD, Logistic Regression, and Random Forest. classifiers for intrusion detection are listed below.
They used a taxonomy considering both eager and lazy learners 4.1. Classification using Support Vector Machine
and Chi-Square to eliminate redundant features. The study (SVM)
evaluated the effectiveness of various classifiers for intrusion
Support Vector Machine (SVM) is a top learning method for
detection using the UNSWNB15 dataset. The RF classifier
binary data classification. It is primarily based on geometric
outperformed other classifiers with a 99.57% accuracy rate, with
principles for finding the optimal hyperplane that separates
some features alone at 99.64%.
different classesas as depicted in Figure 16. SVM is also used in
data privacy to identify breaches due to its high generalizability
The researcher [47] employs a wrapper method with logistic
and ability to escape the dimensionality curse. As a result, SVM is
regression and a genetic algorithm to identify the optimal feature
increasingly used for anomalous activity detection and breach
subset for network intrusion detection systems. They use the
detection in data privacy.
KDD99 and UNSW-NB15 datasets and three decision tree
classifiers to evaluate the effectiveness of the chosen feature
Support Vector Machine (SVM) is a supervised learning method
subsets. The KDD99 dataset demonstrated high classification
that classifies both linear and non-linear data by determining the
accuracy with 99.90%, 99.81% DR, and 0.105% FAR, while the
optimal boundary in high-dimensional spaces as depicted in Figure
UNSW-NB15 dataset had the lowest FAR (6.39%) and acceptable
17. It is widely applied in tasks like pattern recognition and
accuracy.
anomaly detection, including the detection of intrusions. The
training process generally involves preparing the data, training the
The researchers [48] developed a deep-learning model using the
SVM model, and using it to detect anomalies or intrusions.
NSL-KDD dataset to identify intruder patterns. The model
achieved a 90% accuracy rate in identifying harmful network
patterns and a 99.94% accuracy rate in user-to-root attacks. The F-
measure showed a 99.7% accuracy rate, and the precision and
recall of U2R were also 99.7%. The study used a random forest
classifier for high-precision attacks.

Researchers [49] have studied intrusion detection techniques using


support vector machines (SVMs) using the NSL-KDD dataset. The
study found that linear SVM, quadratic SVM, fine Gaussian SVM,
and medium Gaussian SVM had a total detection accuracy of
96.1%, 98.6%, 98.7%, and 98.5%, respectively, with an overall
inaccuracy of 3.9%, 1.4%, 1.3%, and 1.5%. The study concluded Figure 16: An Overview of Support Vector Machine

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3842
𝑓(𝑥) = A. X + b (Linear SVM) a black hexagon, with blue squares representing normal behavior
Or and orange triangles representing abnormal behavior.
𝑓(𝑥) = ∑ m 𝑎𝑖𝑦i 𝐾 (𝑥𝑖, 𝑥) + b (1)
i=1 (Non-linear SVM)

The following are descriptions:


A: The desired hyperplane in SVM maximizes the margin between
classes, while the normal vector is perpendicular to this
hyperplane.
X: The Support Vector Machine's Input data feed unit.
ai: the Lagrange multipliers associated with each support vector.
yi: Class labels.
m: The number of support vectors.
b: bias term
𝐾 (𝑥, 𝑥𝑖): kernel functions.

Figure 18: K- Nearest Neighbors (KNN) Classifier

4.2.1. Pseudo code of KNN

Step 1: Loading the training and testing datasets.


Step 2: Initialize k as the value of the neighbors.
Step 3: From step 1 to the entire number of training data points,
repeat this process to obtain the predicted class.
• Calculate the distance between training and test data using
Euclidean distance, Manhattan distance, Minkowski distance,
Chebyshev, cosine, or other metrics, or hammering distance for
categorical variables.
Figure 17: The architecture of a system for intrusion detection • Using the distance values as a basis, sort the computed distances
built on SVM into ascending order.
• Extract the top k rows from the array that has been sorted.
4.1.1. Pseudo code of SVM • Find the class that appears the most frequently among these
Input: Analyse the many different types of data collected for rows.
training and testing. • Return the predicted class.
Output: To ascertain the algorithm's accuracy. 4.3. Classification using Decision Tree Classifier (DT)
For SVM, determine the ideal gamma and cost values.
While (Termination criterion is not completed) The decision tree algorithm is a widely used classification
do algorithm that uses a tree-shaped graph to classify objects based on
Step 1: For every data point, carry out the SVM training phase. rules applied to the tree's leaves. It is particularly effective for
Step 2: Execute the SVM method to evaluate the testing data intrusion detection, where connections and users are classified as
point. normal or attack types based on pre-existing data. Decision trees,
Step 3: State the SVM-based kernel as K (x, y) = which learn from training data and forecast future data, are
effective for large data sets and real-time intrusion detection due to
where x and y are objects of the attributes range in each parameter their high generalization accuracy [53]. They help create clear
of the training set. security protocols and can be used with minimal processing in rule-
Step 4: The goal is to implement two classes: based models. Decision trees are used to check attribute values of
1 – Normal a network traffic profile tuple, X, with unknown class labels,
2 – Anomaly predicting the leaf node's class label as depicted in Figure 19.
Step 5: End while
Step 6: Return accuracy.
4.2. Classification using K-Nearest Neighbor (KNN)
K-nearest neighbor (KNN) is a non-parametric supervised
classifier that uses distance as a Euclidean measure [52] to predict
desired parameter results. It classifies program behavior as normal
or intrusive, producing results only when requested. KNN analyzes
K instances of training data closest to the test sample and assigns
the most frequently occurring class label.

The KNN classification method categorizes new data into


previously observed classes based on the majority class of its
nearest neighbors. In the Figure 18, the new instance to classify is Figure 19: Decision Tree Classifier

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3843
4.3.1. Pseudo code of DT 4.5. Naive Bayes Classifier (NB)
The Naïve Bayes model operates on Bayesian principles and relies
Input: on the assumption that features are independent of each other when
T // Decision Tree conditioned on the class label. Although it is a straightforward
D // Input Data set model, it often delivers precise outcomes. Misclassifications in
Output: Naïve Bayes can result from noisy data and the model’s inherent
M // Model Prediction bias due to its independence assumption [55]. To reduce the effect
Step 1: Select any input feature from the data set. of noise, using high-quality data is crucial. Unlike clustering
Step-2: for each t ∈ D methods, Naïve Bayes does not divide data into distinct groups but
do estimates probabilities based on the assumption that feature
Step 3: n = root node of T; dependencies do not exist.
Step 4: While n not a leaf node
do Based on the Bayes theorem, the naïve Bayes model operates. It
Step 5: Obtain the answer to question to on n applied t; also makes use of naive assumptions. The chance of one event
Step 6: Identify an arc from t that contains the correct answer; occurring when the likelihood of another is known to be found
Step 7: n = node at end of this arc; using Bayes's theorem. Bayes Theorem mathematical formula:
Step 8: Predict t based on labeling of n;
Step 9: End while P(T|S).P(S)
P (S|T) =
𝑃(𝑇)
(4)
Step 10: Return accuracy.
4.4. Logistic Regression Classification (LR) Where T: The details combined with unlabelled groups.
S: The specific group is represented by the T facts
Logistic regression is a machine learning algorithm used to predict
statement.
binary outcomes, such as normal or anomaly, true or false, or P(S|T): The likelihood of an assumption S
whether an event occurs or fails. It uses a categorical dependent is dependent on the presence of state T.
variable and independent factors to determine the binary outcome. P (S): Assumption likelihood S.
Similar to linear regression, logistic regression is used for P (T|S): T likelihood depending on the state.
classification problems and linear regression for regression P (T): T likelihood.
problems. The "S" curved logistic function predicts the two highest
possible outcomes (0 or 1) and is calculated within a regression 4.5.1. Pseudo code of Gaussian Naive Bayes
model. The supervised machine learning method LR is used to
Input: Training dataset Td,
observe the discrete collection of classes. The logistic function
P = (p1, p2, p3, ……., pn) // measurement of the evaluating
uses the sigmoid function, also known as the cost function, which
testing dataset's predicted variable's value.
maps predictions to probabilities [54], allowing for the prediction
Output: A category of datasets for testing.
of an event's probability as shown in Figure 20.
Step 1: Read the training dataset Td;
Step 2: Determine the predicting variables in each class's mean
P (B = 1|A) or P (B = 0|A) (2) and standard deviation;
Step 3: Repeat
In this case, the independent variable is A, and the dependent • Applying the Gaussian Density Equation, determine the
variable is B. Logistic regression makes use of the sigmoid function. probability of pi for each category.
1
F(x) = (3) • Until the estimated likelihood of every variable that predicted
1+𝑒 −𝑥
(p1, p2, p3, pn) has been determined.
Step 4: Determine each class's likelihood;
F(x) returns a value between 0 and 1, where e is the natural log Step 5: Obtain the most significant likelihood.
base and x is the function's input parameter.

4.6. Random Forest Classifier (RF)


The Random Forest model, also known as Random Decision
Forest, is a classification technique that uses decision trees to make
decisions based on the majority's recommendations as depicted in
Figure 21. It is composed of an ensemble of decision trees
assembled from a bootstrap sample from a training set, with three
hyper-parameters established before training. It is flexible for
regression and classification issues [56].

Random Forest is a popular intrusion detection technique used to


identify anomalies in large datasets. Its strength lies in its ability to
handle complex, high-dimensional data, making it effective for
detecting unusual patterns. However, it may face challenges with
small datasets, potentially leading to overfitting or less reliable
Figure 20: Logistic Regression (LR) performance. Additionally, its computational demands may impact
performance in certain scenarios, especially with large datasets.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3844
4.6.1. Pseudo code of Random Forest Step 3: Implement the XGBoost prominence of feature values
Step 1: From the given training data set, K data points are while selecting attributes.
randomly selected by the method. Step 4: Applying the selected set of attributes from Step 2, develop
Step 2: The process involves creating a decision tree for each data the ML classifier.
point and extracting each tree to obtain an estimate. Step 5: Train the ML classifier.
Step 3: When building decision trees, select the number N. Step 6: Apply the ML optimizer to the ML classifier that was built
Step 4: Go back and repeat steps 1 and 2. in steps 4 and 5.
Step 5: The algorithm will select the most highly voted-for Step 7: Apply the k-fold cross-validation method to evaluate the
predicted outcome as the final estimate. XGBoost classifier model.
Step 8: The final prediction will be obtained by the algorithm.

Figure 22: An overview of the XGBOOST System's architecture


Figure 21: The structural framework of the Random Forest
Algorithm
4.8. Adaboost Classifier

4.7. Extreme Gradient Boosting (XGBOOST) Classifier AdaBoost is a boosting technique that builds a strong model by
combining multiple weak classifiers in a sequential manner. The
XGBoost is a sophisticated ensemble learning technique that process involves training weak classifiers iteratively, each time
enhances the performance of sequential decision tree algorithms adjusting the focus on misclassified instances from the previous
through a method known as gradient boosting. It operates within iteration. This method enhances the performance of the ensemble
the Gradient Boosting Decision Tree (GBDT) framework and is by weighting these weak classifiers according to their accuracy as
designed to handle distributed computing efficiently. XGBoost shown in Figure 23 . AdaBoost's approach of integrating numerous
boosts model accuracy by addressing the residuals left by previous weak models helps in creating a robust classifier that performs well
models and incorporating both first and second-order derivatives on a variety of datasets. This technique is valued for its ability to
of the error function in its optimization process. This "boosting" improve classification accuracy through iterative refinement and
technique combines multiple models to address and correct errors weighted learning.
from earlier iterations, resulting in a more robust overall model.
Using decision trees as its base learners, XGBoost is scalable and
4.8.1. Pseudo code of Adaboost
applicable to a variety of tasks, including classification, regression,
and ranking, making it effective for improving model performance Step 1: Adaboost applies the entire training set but assigns weights
and generating predictions as shown in Figure 22. to each instance.
Step 2: The AdaBoost machine learning model adjusts the weights
XGBoost is a machine learning algorithm that enhances model of the existing instances based on their classification errors in the
performance through gradient boosting, regularization to prevent previous iteration.
overfitting, and memory optimization techniques. Its efficiency Step 3: Erroneously classified observations are given a higher
and scalability are enhanced by features like data and feature weight to increase their likelihood of being correctly classified in
subsampling, making it ideal for handling large and complex the subsequent iteration.
datasets. Step 4: The trained classifier's weight is determined by its
accuracy, with higher weights indicating greater accuracy.
Step 5: The procedure is repeated until the entire training set is
4.7.1. Pseudo code of XGBOOST
error-free or the maximum number of estimators is reached.
Input: Training and Testing Dataset Step 6: The final model is a weighted combination of all the weak
Output: Datasets labeled as either normal or assault, depending on learners.
the individual category designation. Step 7: The final method of calculation will determine the most
Step 1: Evaluate and cleanse the input dataset. accurate prediction.
Step 2: Implementation of the min-max approach to normalize the
input dataset.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3845
Where W represents the weight and b signifies bias.
and y is the output of the model.

The weighted total is input to an activation function, which


determines whether a node fires or not, reaching the output layer
only for activated neural networks.

Figure 23: The Adaboost Algorithm's structural architecture

4.9. Artificial Neural Network (ANN)


Artificial neural networks (ANNs) are efficient problem-solving
techniques that mimic the human brain's structure, making them
more efficient in tasks like pattern recognition than traditional Figure 24: Structure of Artificial Neural Network
digital computers. ANNs are parallel processors used for storing
experimental information and addressing multivariate and
nonlinear modeling issues. They are often used as surrogate or 4.10. Deep Neural Network (DNN)
response surface approximation models [57]. ANNs consist of an Deep Neural Networks (DNNs) enhance Intrusion Detection
input, hidden, and output layer with adjustable weights for artificial Systems (IDS) by identifying complex network data patterns. Data
neurons. The activation functions of artificial neurons range from collection, feature extraction, and multiple-layer DNN design are
(-1 to 1), with logarithmic and tangent sigmoids being popular. done with hyperparameter optimization and overfitting techniques.
ANN design involves determining the number of inputs, outputs, The Evaluation focuses on accuracy, recall, and F1-Score.
hidden layers, and hidden neurons. Challenges like data imbalance and high computational demands
are addressed. DNNs are deployed for real-time intrusion
The three core layers of an artificial neural network are outlined detection, with regular updates to maintain effectiveness.
below:
Intrusion Detection Systems (IDS) are crucial for network security,
• Input layer: To use the model's neural network, it initializes identifying and mitigating unauthorized activities. Integrating deep
data. The actual value from the data is found in the input layers. neural networks (DNN) with IDS enhances their effectiveness by
Each input sent by the computer programmer is approved by the recognizing patterns and anomalies in network data. DNNs can
input layer, which can be determined by the pre-processed dataset's adapt to complex environments, improving their accuracy and
attributes. efficiency in threat detection. This combination offers robust
• Hidden layer: Everything that has to be calculated is protection against emerging and sophisticated threats.
performed in this layer of code, which functions as a bridge
between the input and output layers where the extensive network Deep Neural Networks (DNNs) are a powerful technique for
is composed of three or more layers. It executes all the Intrusion Detection Systems (IDS) by learning complex patterns
computations essential to reveal hidden patterns and features that from network traffic data [58]. This is implemented by collecting
contribute to the outcome. comprehensive datasets, preprocessing through feature selection,
• Output layer: The outcome emerges (normal or anomaly normalization, and augmentation, and training using cross-entropy
according to the assault classifications). loss and optimizers. DNNs are tested on unseen data and deployed
Each of the nodes inside the input layer is entirely linked with in real-time environments, improving detection accuracy and
every other node within the subsequent hidden layer, and so on capturing intricate attack patterns.
throughout all remaining layers. Figure 24, illustrates that the
nodes' interconnections can be viewed as a connected graph. The The detection accuracy of intrusion detection, based on the Deep
hidden layer is implemented to modify the input through an Neural Network (DNN) algorithm model's overall detection level,
assortment of operations that eventually generate an output that is is the proportion of correctly identified normal and anomaly class
communicated via this layer of code. and also assists in developing a resilient and efficient intrusion
detection system (IDS) that can identify and classify unpredictable
The artificial neural network calculates the weighted sum of inputs and abnormal cyberattacks.
after receiving input using a transfer function to add a bias.
Y = ∑n Wi*Xi + b (5)
i=1

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3846
5. Table 8: A Ten-Year Comparative Overview of Related
Works
Reference Dataset Classifiers Evaluated Matrix With Accuracy Findings
Applied
Chakrawarti, NSL-KDD Deep Q-Networks Accuracy "IDS Method Based on Deep Q-
A. ., & and AWID (DQN) Deep Q-Networks (DQN) = 96.8% Networks"
Shrivastava, • Achieves 96.8% accuracy.
S. S.[51] • Showcases elegant techniques for
improved precision.
Mohammad KDD CUP J48, Random TP, TN, FN, FP, Precision Random Forest Classifier Features
Almseidin, 99 Forest, Random J48 = 93.10% • Low RMSE score.
Maen Alzubi, Tree, Decision Random Forest = 93.77% • Lowest false-positive rate.
et al. [59] Table, MLP, Random Tree= 90.57% • Highest accuracy rate of 93.77%.
Naive Bayes, and Decision Table=92.44%
Bayes Network MLP=91.90%
Naïve Bayes=91.23%
Bayes Network=90.73%
Manjula C. NSL-KDD Logistic Precision, Recall, F1-Score Random Forest Classifier's Intrusion
Belavagi et Regression, LR = 0.84 Detection Accuracy
al. [60] Gaussian Naive GNB = 0.79 • Outperforms 99%
Bayes, Support SVM = 0.75 • Allows further study on key
Vector Machine, RFC = 0.99 characteristics
and Random • Supports multiclass classification
Forest. classifiers.
T.Saranyaa, KDD cup Modified K- Accuracy, Precision, Recall, and F-Score ANN, Decision Tree, RF Algorithms
S.Sridevi et 99 means, SVM, J48, Modified K-means = 95.75% for Attack Detection
al. [61] Naïve Bayes, SVM = 98.9% • Performance varies by dataset size
Decision Table, J.48 = 99.12% and application.
PCA-LDA-SVM, Naïve Bayes = 92.7%
Logistic Decision Table = 99.45%
Regression, PCA-LDA-SVM = 92.16%
Decision Tree, Logistic Regression = 98.3%
ANN, LDA, Decision Tree = 99.65%
CART, Random ANN = 99.65%
Forest LDA = 98.1%
CART = 98%
Random Forest = 99.81 %
Kasongo SM, UNSW- ANN, kNN, DT, Accuracy, Precision, Recall, and F1-Score XGBoost Enhances Binary
Sun Y [26] NB15 LR and SVM ANN = 94.49% Classification Scheme
KNN = 96.76% • Test accuracy increases from 88.13
DT = 93.65% to 90.85%.
LR = 93.22%
SVM = 70.98%

Mahmood NSL-KDD Naïve Bayes, TP, TN, FP, FN Decision Tree Classifier
RAR, Abdi A KNN,DT and Accuracy, Precision, Recall, and F-Score Performance
et al. [62] SVM, GA,PSO Naïve Bayes = 90.13% • Outperforms other classifiers in
(Feature KNN = 98.89% accuracy, precision, recall, f-score.
Selection) DT = 99.38% • Optimal feature coupling reduces
SVM = 93.55% model-building time and data
analysis burden.
M. Choubisa, NSL-KDD Random Forest Accuracy, DR, FAR, MCC (Matthew's "Assault Categorization Model
R. Doshi et (Feature correlation coefficient) Improvement"
al. [63] Selection) Dos = 99.69% • Utilized new feature selection
Probe = 99.69% method.
R2L = 99.68% • Improved using random forest
U2R = 99.69% classifier.
Dhanabal, L., NSL-KDD J48, SVM, Naïve J48 SVM NaïveBayes J48's Network Classification
Shantharajah Bayes (Test Normal=99.8 Normal=98.8 Normal=74.9 • Outperforms CFS for data set
[6] Accuracy with 6 DoS =99.1 DoS =98.7 DoS =75.2 classification.
features) Probe= 98.9 Probe= 91.4 Probe= 74.1 • Demonstrates potential for network
U2R =98.7 U2R =94.6 U2R =72.3 classification.
R2L= 97.9 R2L= 92.5 R2L= 70.1

Modi, KDD Bayes Net, Naïve Precision, Recall-measure, and ROC "J48, J48 Graft, Random Forest
Urvashi & CUP99 Bayes,J48, Bayes Net = 0.98 Machine Learning Algorithms: Over
Jain, Anurag J48Graft and Naïve Bayes = 0.96 96% Detection Rate"
[27] Random forest J48 = 0.98
J48Graft = 0.98
Random Forest = 0.98

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3847
Mohammadi, KDD FGLCC-CFA, AR, DR, FPR FGLCC-CFA Filter Performance
Sara & CUP99 FGLCC, CFA, Methods DR AR FPR • Outperforms other methods.
Mirvaziri [64] ID3-BA, N- FGLCC-CFA 95.23 95.03 1.65 • Achieves higher AR and DR.
KPCA-GA-SVM, FGLCC 91.32 92.59 2.15 • Lower FPR of 1.65%.
KMSVM CFA 92.05 2.83 3.90
ID3-BA 91.57 92.59 3.16
N-KPCA-GA-SVM 91.21 92.51 2.01
KMSVM 88.71 87.01 -
Verma P, NSL- XGBoost and TP, TN, FP, FN "Enhanced Network Intrusion
Shadab K, et KDD AdaBoost with Accuracy, Precision, Recall, and F1-Score Detection Methods"
al. [18] KMeans XGBoost with KMeans Clustering = 84.25% • Demonstrates accuracy.
Clustering, Adaboost with KMeans Clustering = 82.01% • Uses machine learning for anomaly
XGBoost, XGBoost = 80.23% detection.
AdaBoost Adaboost = 80.73% • Addresses varying probability
XGBoost with K-Means Clustering (Proposed) = distributions.
84.25%
W. L. Al- KDD Multi-level hybrid Accuracy, DR, FAR Modified K-means Enhances KDD
Yaseen, Z. A. CUP99 SVM, Extreme Modified K-means (Proposed) ACC=95.75% Training
Othman et al. Learning Machine DR = 95.17% • Reduces 10% KDD dataset to
[65] (ELM), Modified FAR = 1.87% 99.8%.
• Creates high-quality SVM and
ELM training datasets.
• Enhances multi-level model
detection accuracy.
G. NSL- SVM, ANN Accuracy, Precision, Recall, f1-score SVM vs ANN Intrusion Detection
Yedukondalu, KDD SVM = 48% • SVM: 48% accuracy
G. H. Bindu et ANN = 97% • ANN: 97% accuracy
al. [36]

Tuan-Hong CIC- DT, RF,SVM, NB, Accuracy, Precision, Recall, and F1-Score Experiment Findings:
Chua and IDS2017 ANN,DNN CIC-IDS2017 CSE-CIC-IDS2018 • ANN model optimal for frequent
Iftekhar Salam and the DT = 0.9959 DT = 0.5942 infrastructure upgrades and high
[66] CSE- RF = 0.9967 RF = 0.5949 cyberattack costs.
CIC- SVM = 0.9600 SVM = 0.7559 • DT more effective for systems
IDS2018 NB = 0.7296 NB = 0.4972 without frequent updates or
ANN = 0.9549 ANN = 0.7000 significant attacks.
DNN = 0.9735 DNN= 0.6518 • DT best for training and
categorizing data.
Ghose, NSL- MLP,LSTM,NB, Accuracy, Precision, Recall, and F1-Score NSL-KDD Dataset Accuracy
Dipayan & KDD DT,KNN,RF,SV NB = 75.9% • LSTM (97.77%) and MLP
Partho et al. M DT = 88.2% (96.89%) used.
[29] KNN = 87.0% • Two labels, 41 traffic input features
RF = 89.6% per record.
SVM = 87.6%
MLP = 96.89%
LSTM = 97.77%
Thaseen, I. S., NSL- SVM (Multiclass), TP and FP Rate, Precision, Recall, F-Measure, Experiment on Intrusion Detection
et al. [67] KDD Chi-square ROC area • Model achieved 98% accuracy.
SVM (Multiclass) = 98% • Utilizes SVM and chi-square
feature selection.

K. Dinesh and NSL- RF, Accuracy, Precision, Recall, and F1-Score Intrusion Detection Systems
D. Kalaivani et KDD CART(Classificati CART = 89% Evaluation
al. [30] on and Regression RF = 93% • Utilized meta-heuristic and
Trees), SVM, and SVM = 94% machine learning.
MLP with MLP = 96% • Improved recall, accuracy,
different meta- precision.
heuristic • GA-optimized MLP classifier
algorithms (GS, achieved 96% accuracy.
GBO, SA, and
GA)
Safura A. NSL- RF, DT Accuracy, Precision, Recall, and F1-Score Model's Tolerance Expansion
Mashayak, KDD Accuracy • Model's capabilities expanded to
et.al[68] RF = 99.2% 13 class categorizations.
DT = 99% • Despite additional assault classes,
model performs exceptionally well.

J. Ren, J. Guo, UNSW- Genetic algorithm Accuracy, FAR, Macros precision, Macros DO IDS: RF Classifier-Based Data
W. Qian, et al. NB15 and Random recall, Macros f1-score Optimization
[31] Forest based Accuracy • Outperforms RF classifier in all
feature selection, RF = 86% indicators.
DO_IDS DO_IDS = 92.8% • Scores 92.8%.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3848
K. -A. Tait et UNSW- RF, KNN, SVM, Accuracy, Precision, Recall, f1-score Experimental Results: Multiclass
al. [32] NB 15 Binary, and Accuracy Classification Enhances Intrusion Detection
and Multiclass Binary Class Multiclass • Enables targeted attack response.
CICIDS2 RF = 0.9977 RF = 0.9294
017 SVM SVM
(Medium Gaussian) =0.9962, Medium
Gaussian) = 0.9338, KNN – Weighted
= 0.9983

Brao, Bobba et NSL-KDD K-Nearest Neighbor Accuracy Innovative Approach: 99.6% Accurate
al. [33] (KNN) KNN = 99.95% Results
• Boosts efficiency

Nabila NSL- Random forest Accuracy, DR, FAR, MCC Model's Effectiveness Experiments:
Farnaaz and KDD (RF) based Accuracy • 99.67% scoring system
M.A Jabbar ensemble RF = 99.67% • Low false alarm rate
[34] classifier • High detection rate.

Lin, Wei-Chao KDD k-Nearest Accuracy Feature Encoding Implementation


& Ke, Shih- CUP99 Neighbor (k-NN), CANN=99.76% • Implemented for assaults and conventional
Wen et al. [35] Cluster Center and KNN=93.87% connectivity.
Nearest Neighbor SVM=80.65% • CANN outperforms 99.76% accuracy.
(CANN), Support • Compatibility with k-NN and support vector
Vector Machine machines.
(SVM)
Bhavani T. T, NSL- Random Forest Accuracy Random Classifier Performance
Kameswara KDD (RF), Decision RF = 95.323% • Best result: 95.323% success rate.
M. R et al. [69] Tree (DT) DT = 81.868% • Easy implementation.

Ponthapalli R. NSL- Decision Tree Accuracy Research Findings:


et al. [70] KDD (DT), Logistic RF=73.784% • Random Forest classifier: Most effective
Regression (LR), DT=72=303% with maximum accuracy of 73.784%.
Random Forest SVM=71.779%
(RF), Support R=68.674%
Vector Machine
(SVM)
Dutt I. et al. KDD Feature Selection Accuracy Experiment Results:
[19] CUP99 using Chi-Square Feature Selection using Chi-Square • Daily improvement in true positive rate
Analysis, Feature Analysis = 92.65% accuracy.
extraction • Sharp drop in false negative percentage.
Frequency episode
extraction
Maniriho et NSL- Single Machine TPR, FPR, Accuracy, Precision, Mis Ensemble Technique Outperforms Single
al.[21] KDD, Learning (Misclassification rate) Classifiers
UNSWN Classifier NSL-KDD using • Model assessed using two distinct datasets.
B-15 (K- Nearest KNN=98.727%
Neighbor (KNN)), NSL-KDD using RC=99.696%
Ensemble UNSW NB-15 using
Technique KNN=97.3346%
(Random UNSW NB-15 using
Committee (RC)) RC=98.955%
Kazi A., Billal NSL- Artificial Neural Accuracy ANN-based Machine Learning Outperforms
M et al. [71] KDD Network (ANN), ANN = 94.02% SVM in Network Traffic Classification.
Support Vector SVM = 82.34%
Machine (SVM)
A. Aziz, NSL- Breadth-Forest TP, FP, FN, Precision, Recall, F-Score Improved False Positives Percentage
Amira & KDD Tree (BFTree), Accuracy • NBTree and BFTree outperformed J48 and
Hanafi et al. Naïve Bayes BFTree=98.24% RFTree.
[72] Decision Tree NBTree=98.44% • MLP scored highest in DoS and Normal
(NBTree), J48, J48=97.68% classifications.
Random Forest RFT=98.34% • Struggled with R2L and U2R assaults.
Tree (RFT),Multi- MLP=98.53%
Layer Perceptron NB=84.75%
(MLP),Naïve
Bayes
Alkasassbeh, KDD J48 Tress, TP Rate FP Rate Precision ROC Area J48 Classifier's Accuracy Improvement
M & CUP’99 Multilayer Accuracy • Solved low assault detection issue.
Almseidin, M Perceptron (MLP), J48=93.1083% • Achieved highest accuracy rate for KDD
[73] Bayes Network MLP=91.9017% dataset attacks
Bayes Network=90.7317%

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3849
Rasane, KDD- KNN.NB, SGD, Accuracy "RF Algorithm Outperforms Other Machine
Komal & Cup 99, DT, RF KNN = 97.8% Learning Methods for Data Categorization"
Bewoor et NSL- NB = 89% • Experimental results show superior
al. [74] KDD, SGD = 94.9% performance on RF.
UNSW- DT = 97.9%
NB 15, RF = 98%
Kyoto
2006+
Anwer, UNSWN J48 and NB Accuracy Experimental Results:
H.M., B-15 J48 = 88% • Use 18 GR ranking features and J48 classifier.
Farouk, M., NB = 76% • Achieve 88% accuracy rate.
et al. [75]
Ravale, KDD Hybrid k-means Accuracy KMSVM Algorithm Outperforms KM and
Ujwala & CUP’99 and SVM-RBF KMSVM = 88.71% SVM
Marathe et • Improves accuracy results.
al. [76]
Khammassi, KDDCup Genetic Algorithm Accuracy, FAR Experiment Results:
Chaouki et 99, (GA) as search and KDD CUP’99 (GALR-DT) = 99.90% • High classification accuracy with 18
al. [47] UNSW- Logistic UNSW-NB15 (GALR-DT) = 81.42% characteristics.
NB15 Regression (LR) • 99.90% DR, 99.81% DR, 0.105% FAR.
as learning • Utilized KDD99 dataset.
algorithm
Kotpalliwar KDD SVM Accuracy "KDD Dataset Analysis"
MV et al. CUP’99 Validation Accuracy = 89.85% • 10% datasets varied in assault types and
[77] Classification Accuracy = 99.9% samples.
• Resulted in "mixed" dataset with 99.9%
accuracy.
A, Anish NSL SVM, Naïve Accuracy, Misclassification Rate "SVM Outperforms Naïve Bayes in Machine
Halimaa; KDD Bayes SVM = 97.29% Learning"
Sundarakant Naïve Bayes = 67.26% • Higher accuracy rate (97.29)
ham, K. [78] • Lower misclassification rate (2.705)
Basheri, NSL SVM, RF, ELM Accuracy, Recall NSL KDD Dataset for Intrusion Detection
Mohammad KDD SVM (Linear) = 99.2% • Utilized for knowledge discovery and data
& Iqbal et al. RF = 97.7% mining.
[22] ELM = 99.5% • ELM outperforms other strategies.
S. Teng, N. KDD Single Type- Accuracy Optimized CAIDM:
Wu, H. Zhu CUP’99 SVM, CAIDM Single Type-SVM = 81.72% • Based on 2-class SVMs and DTs.
et al [80] CAIDM = 89.02% • Enhances accuracy and efficiency.
B. S. Bhati NSL- SVM Accuracy, Error Rate Error Rate Analysis of SVM Detection Accuracy
and C. S. Rai KDD Linear SVM = 96.1% 3.9% • Linear SVM, quadratic SVM, fine Gaussian
[49] Quadratic SVM = 98.6% 1.4% SVM.
Fine Gaussian SVM = 98.7% 1.3% • Medium Gaussian SVM has varying
Medium Gaussian SVM = 98.5% 1.5% accuracy.
• Fine Gaussian SVM offers best accuracy and
minimal error.

D. Gupta, S. NSL- Data mining Accuracy Network Assault Identification


Singhal, et KDD techniques, Linear Linear Regression = 80% • Linear regression: 80% accuracy
al. [81] Regression, K- K-Means Clustering = 67.5% • K-means clustering algorithm: 67.5%
Means Clustering accuracy.
K. Goeschel KDDCup SVM, DT, Naïve Overall Accuracy = 99.62% "Accuracy in Final Phase: 99.62%"
[82] ’99 Bayes techniques • False Positive Rate: 1.57%
• Higher FPR: 4.29%.
Iqbal, A., & NSL- Forward Neural Accuracy, MCC, R-squared, MSE, DR, "Multiple Classifier Combination Enhances
Aftab, KDD Network FAR and AROC Performance"
S.[46] (FFANN), Pattern FFANN=98.0792% • Both models show superior performance in
Recognition PRANN=96.6225% attack detection metrics.
Neural Network
(PRANN)
Shyla,kapil KDD Naïve Bayes, Accuracy, Precision, Recall and F1- KDD Cup99 Dataset Comparison
kumar et al. Cup99 Linear SVM, Score • Compared algorithms' accuracy, precision,
[83] Random Forest Naïve Bayes = 0.971 detection rate.
SVM = 0.994 • Random Forest ranked highest with 0.999
Random Forest = 0.999 detection rate.
Deyban P. NSL- A hybrid model of Accuracy, Error rate, Sensibility, Enhancing IDS Effectiveness
Miguel A. A KDD supervised Specificity, Precision, ROC Curve • Incorporating supervised and unsupervised
[20] (Neural Network SVM+K-Means=96.81% learning methods.
(NN), Support NN+K-Means=95.55%
Vector Machine
(SVM)), and
unsupervised (K-
Means) machine
learning
algorithms.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3850
Aburomman, KDDCup PCA-SVM,LDA- Accuracy Ensemble PCA-LDA-SVM Method:
Abdulla & ’99 SVM,PCA-LDA- PCA-SVM = 0.8902 Outperforms Single Feature Extraction
Reaz et al. [84] SVM LDA-SVM = 0.8993 • Improves accuracy and performance.
PCA-LDA-SVM = 0.9216
Overall-accuracy (ACC) = 0.92162, False-
positive (FP) = 0.0196, False-negative (FN)
= 0.10849
Al-Jarrah, O. NSL, SMLC,AdaBoost Accuracy SMLC Outperforms Supervised Ensemble
Y., Al- Kyoto M1,Bagging,RF NSL KDD Kyoto 2006+ ML on Network Intrusion Datasets
Hammdi et al. 2006+ SMLC = 99.58% SMLC = • Detection accuracy comparable to
[85] 99.39% supervised ensemble models.
AdaBoostM1 = 94.20% AdaBoostM1= • 20% fewer labeled training data instances.
95.88%
Bagging = 99.55% Bagging =
99.39%
RF = 99.62% RF = 99.37%
B. M. Irfan, V. KDD DT,RF,SVM,NN, Accuracy, Precision, Recall, FPR. F1- Deep Learning Models for Intrusion
Poornima et CUP’99, DL models Score Detection
al. [86] NSL KDD CUP’99 NSL-KDD • Convolutional and recurrent neural
KDD DT = 0.85 DT = 0.82 networks enhance detection.
RF = 0.89 RF = 0.85 • Random forests improve recall.
SVM = 0.87 SVM = 0.84
NN = 0.88 NN = 0.86
DL models = 0.90 DL models = 0.88
M. D. Rokade NSL- Naïve Bayes, Accuracy, Precision, Recall, f1-score Experimental Study on Anomaly Detection
and Y. K. KDD SVM, ANN, RF NB = 98% • Uses SVM, Naïve Bayes, ANN.
Sharma [87] SVM = 95% • Demonstrates real-time network
ANN = 95% performance.
RF = 88%

S. Waskle, L. KDD SVM, Naïve Accuracy, Error Rate Proposed Technique Outperforms SVM,
Parashar et al. CUP’99 Bayes, DT, PCA SVM = 84.34% Naive Bayes, Decision Trees
[88] With Random NB = 80.85% • 96.78% accuracy rate
Forest DT = 89.91% • Minimal 3.24 minute performance time.
PCA With RF = 96.78%

M. Hammad, UNSWN Naïve Bayes, J48, Accuracy, Precision, Recall, f1-score, J48 and RF Algorithms: Favorable
W. El-medany B-15 RF, ZeroR FPR, Specificity Outcomes.
et al. [38] NB = 76.04%
J48 = 93.78%
RF = 97.60%
ZeroR = 68.06%
A. Singhal, A. NSL- KNN, DT, NB, Accuracy "Inference Detection Model Enhances
Maan et al. KDD SVM Independent Accuracy Inference Factors"
[39] Function. • Enhances detection of unobservable
KNN = 95.755% KNN = 97.605% factors.
DT = 98.088% DT = 98.554% • Uses SVM technique for improved
NB = 82.971% NB = 66.687% accuracy.
SVM = 81.263% SVM = 93.914%
J. D. S. W.S. KDD LDA, SVMr, RF, Sensitivity(TPR) Specificity (TNR) ROC "Inference Detection Model Enhances
and P. B. [40] CUP’99 ADABoost LDA = 0.9762 0.7121 0.8827 Factors"
SVMr = 0.9933 0.9511 0.9782 • Enhances detection of unobservable
RF = 0.9884 0.9495 0.9718 factors.
ADABoost = 0.9975 0.9569 • Uses SVM technique for improved
0.9824 accuracy.

P. V. Pandit, CICIDS2 RF,XG Boost, Accuracy, Precision, Recall, f1-score, Python Technique Enhances Detection
S. Bhushan, et 017 Extra Tree,DT RF = 0.9899 Accuracy
al. [43] XG Boost = 0.9929 • Utilizes precision, recall, f1-score metrics.
Extra Tree = 0.9935 • Outperforms state-of-the-art technology.
DT = 0.9946

Anouar UNSWN SVM model for TPR, FPR, Accuracy, Precision, Recall, Simulation Results:
Bachar, N. E. B-15 binary f1-score • SVM-Gaussian model improves accuracy
[44] classification SVM Polynomial = 94% by 93%
SVM Gaussian = 93% • SVM-Polynomial model enhances
accuracy by 94%.

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3851
Nitu Dash, NSL MLP, Radial Basis Accuracy PCA Feature Extraction Model
S. C. [45] KDD Classifier, and PCA + MLP = 97.8803 % • Enhances accuracy in short computational
gradient descent PCA + RBF Classifier = 98.1162% time.
Backpropagation • RBF: 98.1162% accuracy in 36.28 seconds.
learning algorithm • MLP: 97.8803% accuracy in 41.37 seconds.
Kumar, G. UNSWN KNN, LR, NB, Accuracy, Precision, Recall, F1-Score, RF Classifier Performance on UNSWNB-15
K. [46] B-15 SGD and RF MSE, TPR, FPR • Outperforms other classifiers.
KNN = 98.28% • Accuracy: 99.57% with all features, 99.64%
LR = 98.42% with some features.
NB = 76.59%
SGD = 98.16%
RF = 99.57%
Senthilnaya KDD Modified- Detected assault accuracy Modified KNN Feature Selection
ki, B., CUP’99 KNN,SVM and M-KNN = 98.58% • Reduces undesirable traits.
Venkatalaks KNN • Improves security.
hmi [50] • Reduces false alarm rates.
• Outperforms other algorithms.
J. Gao, S. UNSWN ELM, MVT Accuracy, TP, TN, FP, FN "MVT Improves IDS Accuracy"
Chai, C. B-15 (Multi-Voting Accuracy = 89.71% • Superior to IDS without MVT.
Zhang et al. Technology) • Suggests strategy for high detection accuracy.
[89] • Reduces time required.
M. Zaman Kyoto KM,KNN,FCM,N Precision, Recall, Accuracy, ROC RBF Classification Method Performance
and C.-H. 2006+ B,SVM, RBF, Accuracy ROC • Superior accuracy: 0.9754
Lung [90] Ensemble KM = 0.836 KM = 0.6148 • Outperforms Ensemble approach: 0.9631
KNN = 0.9754 KNN = 0.9532
FCM = 0.836 FCM = 0.6148
NB = 0.9672 NB = 0.9481
SVM = 0.9426 SVM = 0.8023
RBF = 0.9754 RBF = 0.9741
Ensemble = 0.9672 Ensemble = 0.9639
Mazarbhuiy KDDCU IFRSCAD Normal TPR, Attack TPR "Proposed Algorithm Outperforms
a, Fokrul et P'99, (Intuitionistic Normal detected assaults Classification-Based Algorithms"
al. [91] Kitsune Fuzzy-Rough Set- KDD CUP’99 • Extracts anomalies with 96.99% accuracy.
Based IFRSCAD (Normal TPR) = 96.99% and • Demonstrates superior performance with
Classification for 91.289% KDDCUP'99 and Kitsune datasets.
Anomaly Kitsune
Detection) IFRSCAD (Attack TPR) = 96.29 and
91.289%
Shen Kejia, NSL- FRSTSS, FRSTFS Accuracy of detected assaults Network-Based Intrusion Detection Systems
Hamid KDD (Fuzzy Rough Set FRSTSS+FRSTFS+SVM = R2L, U2R, Evaluation
Parvin et al Theory based Probe, DoS, Normal = 99.21%, 57.07%, • Uses fuzzy rough set theory and SVM.
[92] Sample Selection 99.94%, 97.98%, 97.02% • Highlights feature selection techniques'
and Feature efficiency.
Selection ), SVM
Sever, Hayri ADFA-LD RSC(Rough Set Precision, Recall, F-Score Study on Attribute Reduction Strategy
& Raoof et al. and NSL- Classification) RSC = 88.9 • Demonstrates impact on classification accuracy.
[93] KDD approach using KNN = 86.2 • RSC model achieves 88.9% F-score.
MODLEM SVM = 82.0 • Uses fuzzy, rough set of ten features.
algorithm, KNN, NB = 74.8
SVM, NB, DT DT = 86.1
Q. Zhang, Y. KDD Cup MFNN ( Multi- Detected assaults "Evaluating Feature Selection Techniques"
Qu et al. [94] 99 functional nearest- MFNN = 99.62% • Focuses on KFRFS for accuracy and reduction.
neighbour), NB, NB = 91.03% • Highlights superior computational efficiency.
SMO, IBK (Instance- SMO = 97.30%
based), Ada-boost IBk = 99.64%
and RF Ada-Boost = 99.89%
RF = 99.93%
Panigrahi, NSL-KDD Fuzzy NN, Fuzzy- Accuracy, Precision, Recall, FAR Study on Classifier Performance
Ashalata & Rough NN, FRONN, Random Search • Evaluates accuracy, detection rate, precision, false
Patra, Manas VQNN, OWANN. Fuzzy NN = 94.5591 alarm rate.
[95] Fuzzy Rough NN = 99.5951 • Finds fuzzy ownership nearest neighbor
VQNN = 99.3991 classification with random search superior.
Fuzzy Ownership NN = 99.6086
OWANN = 99.388
Tripathy, S. KDD Cup KNN, DT, MNB, Accuracy, Precision, Recall, F-1 Score SVM Classifier Performance
S., & Behera, 99 BNB, RF, SVM, KNN = 0.9724, DT = 0.9713, MNB = 0.9329, • Outperforms other classifiers with 98.08% score.
B. [4] PPN, LR, BNB = 0.9473, RF = 0.9714, SVM = 0.9808, PPN • DT, RF, BPN, KNN perform well.
XGBOOST, = 0.9101, LR = 0.9667, XGBOOST = 0.9464,
AdaBoost, SGD, AdaBoost = 0.9431, SGD = 0.9046, Ridge =
Ridge, RC, PA, BPN 0.9495, RC = 0.9487, PA = 0.9443, BPN = 0.9704

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3852
Kushal Jani, NSL- KNN,RF,DT,NB, Accuracy, Precision, Recall Deep Learning vs Artificial Neural Networks
Punit Lalwani KDD SVM,ANN,DNN KNN = 76.47% • Deep learning: 86.75% accuracy
et al. [96] RF = 78.43% • Artificial Neural Networks: 84.87%
DT = 80.95% accuracy
NB = 82.07% • Minimal false negatives and positives.
SVM = 82.63%
ANN = 84.87%
DNN = 86.75%
Kocher, Geeta UNSW- KNN,NB,RF,SG Accuracy, Precision, Recall, F-1 Score RF Classifier Performance on UNSW15
& Kumar NB15 D,LR LR = 98.17% Dataset
Ahuja [97] NB = 75.16% • Outperforms other classifiers with 99.64%
RF = 99.64% accuracy.
SGD = 97.99% • Potential for multiclass classification
KNN = 98.90% intrusion detection.

6. Conclusions and Future Works These above methods have significant potential in computer
networking fields, potentially enhancing the reliability and
Fast growth in the web, including networking sites and current
efficiency of intrusion detection systems.
communication methods, led to a boom of networking information
and, finally, with it, an unprecedented variety of new hazards to Acknowledgements
computer security. More investigators are now developing and
putting into practice more complex strategies, applying machine
My sincere gratitude goes out to my mentor, Dr. Bichitrananda
learning (ML) techniques like SVM, KNN, DT, LR, NB, RF,
Behera of the Department of Computer Science and Engineering
XGBOOST, Adaboost, and ANN, to help combat these threats to
at C.V. Raman Global University in Bhubaneswar, Odisha, who
our digital lives in response to the aforementioned latest helped me finish this topic and guided me in pursuing it.
developments. Among these instances, SVMs are regarded as one Additionally, I want to express my gratitude to my family and
of the distinctive algorithms used in machine learning for intrusion brother for his amazing advice and unceasing support. I consider it
detection, solely owing to their exceptional extrapolation authority
an honor that I was able to conduct my research with his help.
across dataset sizes and their capacity to avert the curse of
dimensionality. We have discussed IDS datasets and their
Author contributions:
specifications in this review paper, including prior research on the
various IDS types and various machine learning (ML) 1. Sudhanshu Sekhar Tripathy: Conceptualization,
classification algorithms. Along with presenting an in-depth Algorithm design, Literature study, Dataset
review of the various datasets, such as KDDCUP'99, NSL-KDD,
analysis
UNSW-NB15, CICIDS-2017, and CSE-CIC-IDS2018, we
additionally discussed the significance of using different ML
2. Dr. Bichitrananda Behera: Algorithm validation,
classifiers in the detection of intrusions and performance Dataset Investigation
evaluation.
Conflict of Interest
We have reviewed research studies that implement the machine
learning (ML) classification algorithms pointed out above for the
The authors declare that they have no conflict of interest.
detection of intrusions, counting their techniques and methods of
operation. Furthermore, we have offered a tabulated summary and Competing Interests
critical analysis of each of these methods, which were assessed
using five different datasets and ML classifiers. This highlights the The authors have no competing interests to declare that are
functions of the various algorithms that were employed, as well as
relevant to the content of this article.
the attacks that were discovered and the results of the performance
evaluation, including accuracy metrics.
Funding Details

Advancements in machine learning techniques are being used by No funding was received to assist with the preparation of
academics and researchers to develop classification models for this manuscript.
intrusion detection systems. This research paper reviews studies
from the past decade that have implemented machine learning
References
classification methods to enhance the performance of intrusion
detection systems with high accuracy and low false alarm rates.
[1] L. Buczak and E. Guven, “A Survey of Data
Mining and Machine Learning Methods for Cyber
We plan to extend this research in the future by incorporating a
Security Intrusion Detection,” IEEE Commun. Surv.
systematic analysis and review of IDS applied to other popular
Tutorials, vol. 18, pp. 1153–1176, 2016, doi:
digital security or industry-focused real-time intrusive datasets for
10.1109/COMST.2015.2494502.
machine and deep learning methodologies, like CNN, RNN, deep
[2] Denning, Dorothy E. “An Intrusion-Detection
autoencoders, and generative adversarial networks (GANs).
Model.” IEEE Transactions on Software Engineering
SE-13 (1987): 222-232.
[3] XU, X. (2006). Adaptive intrusion detection based
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3853
on machine learning: feature extraction, classifier https://www.unb.ca/cic/datasets/ids-2018.html
construction, and sequential pattern prediction. (accessed May 30, 2020).
International Journal of Web Services Practices, 2(1-2), [17] M. K. Ibraheem, I. M. A. Al- Khafaji, and S. A.
49-58. Dheyab, “Network intrusion detection using deep
[4] Tripathy, S. S., & Behera, B. “PERFORMANCE learning based on dimensionality reduction,” REVISTA
EVALUATION OF MACHINE LEARNING AUS, vol. 26, no. 2, pp. 168–174, 2019.
ALGORITHMS FOR INTRUSION DETECTION [18] Verma P, Shadab K, Shayan A. and Sunil B.
SYSTEM,” Journal of Biomechanical Science and (20Network Intrusion Detection using Clustering and
Engineering, pp. 621–640, July. 2023doi: Gradient Boosting. International Conference on
10.17605/OSF.IO/WX6CS. Computing, Communication and Networking
[5] F. Sabahi and A. Movaghar, “Intrusion detection: Technologies (ICCCNT). (pp. 1-7). IEEE.
A survey,” In 2008 Third International Conference on [19] Dutt I. et al. (2018). Real-Time Hybrid Intrusion
Systems and Networks Communications (pp. 23-26). Detection System. International Conference on
IEEE, 2008, October. Communication, Devices and Networking (ICCDN).
[6] Dhanabal, L., & Shantharajah, S. P. (2015). A (pp. 885-894). Springer
study on NSL-KDD dataset for intrusion detection [20] Deyban P. Miguel A. A, David P. A, and Eugenio
system based on classification algorithms. International S. (2017). Intrusion detection in computer networks
journal of advanced research in computer and using hybrid machine learning techniques. XLIII Latin
communication engineering, 4(6), 446-452. American Computer Conference (CLEI). (pp. 1-10).
[7] Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. IEEE.
A. (2009, July). A detailed analysis of the KDD CUP 99 [21] Maniriho et al. (2020). Detecting Intrusions in
data set. In 2009 IEEE symposium on computational Computer Network Traffic with Machine Learning
intelligence for security and defense applications (pp. 1- Approaches. International Journal of Intelligent
6). IEEE. Engineering and Systems. INASS. (433-445).
[8] Dhanabal, L., & Shantharajah, S. P. (2015). A [22] Iqbal, A., & Aftab, S. (2019). A Feed Forward and
study on NSL-KDD dataset for intrusion detection Pattern Recognition ANN Model for Network Intrusion
system based on classification algorithms. International Detection. International Journal of Computer Network
journal of advanced research in computer and and Information Security
communication engineering, 4(6), 446-452. [23] M. R. Watson, A. K. Marnerides, A. Mauthe, and
[9] I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani, D. Hutchison (2016) Malware detection in cloud
Toward generating a new intrusion detection dataset and computing infrastructures. IEEE Transactions on
intrusion traffic characterization, in Proceedings of the Dependable and Secure Computing, 13(2):192-205.
4th International Conference on Information Systems [24] Kaja, N., Shaout, A.K., & Ma, D. (2019). An
Security and Privacy (ICISSP 2018), Vol. 1, 2018, pp. intelligent intrusion detection system. Applied
108–116. doi:10.5220/0006639801080116. Intelligence, 49, 3235 – 3247.
[10] Intrusion detection evaluation dataset (CIC [25] M. R. Watson, A. K. Marnerides, A. Mauthe, and
IDS2017), https://www.unb.ca/cic/datasets/ids-2017. D. Hutchison (2016) Malware detection in cloud
Html. computing infrastructures. IEEE Transactions on
[11] A. Thakkar, R. Lohiya, A review of the Dependable and Secure Computing, 13(2):192-205.
advancement in intrusion detection datasets, Procedia [26] Kasongo SM, Sun Y (2020) Performance analysis
Computer Science 167 (2020) 636–645. of intrusion detection systems using a feature selection
doi:10.1016/j.procs.2020.03.330. method on the UNSW-NB15 dataset. J Big Data 7:1–20.
[12] Meftah, Souhail, Tajje-eddine Rachidi and Nasser [27] Modi, Urvashi & Jain, Anurag. (2016). An
Assem. “Network Based Intrusion Detection Using the Improved Method to Detect Intrusion Using Machine
UNSW-NB15 Dataset.” International Journal of Learning Algorithms. Informatics Engineering, an
Computing and Digital Systems (2019): n. pag. International Journal. 4. 17-29. 10.5121/ieij.2016.4203.
[13] Y. Zhou, G. Cheng, S. Jiang, and M. Dai, [28] A. A. Yilmaz, "Intrusion Detection in Computer
“Building an efficient intrusion detection system based Networks using Optimized Machine Learning
on feature selection and ensemble classifier,” Computer Algorithms," 2022 3rd International Informatics and
Networks, vol. 174, p. 107247, Jun. 2020, doi: Software Engineering Conference (IISEC), Ankara,
10.1016/j.comnet.2020.107247. Turkey, 2022, pp. 1-5, doi:
[14] R. I. Farhan, A. T. Maolood, and N. F. Hassan, 10.1109/IISEC56263.2022.9998258
“Optimized deep [29] Ghose, Dipayan & Partho, All & Ahmed, Minhaz
learning with binary PSO for intrusion detection on & Chowdhury, Md Tanvir & Hasan, Mahamudul & Ali,
CSE-CICIDS2018 dataset,” Journal of Al-Qadisiyah for Md & Jabid, Taskeed & Islam, Maheen. (2023).
Computer Science and Mathematics, vol. 12, no. 3, pp. Performance Evaluation of Intrusion Detection System
16–27, 2020, doi: Using Machine Learning and Deep Learning
https://doi.org/10.29304/jqcm.2020.12.3.706. Algorithms. 1-6.10.1109/IBDAP58581.2023.10271964.
[15] [“Registry of open data on AWS.” [30] K. Dinesh and D. Kalaivani, "Enhancing
https://registry.opendata.aws/cse-cic-ids2021/ (accessed Performance of Intrusion detection System in the NSL-
May 30, 2020”. KDD Dataset using Meta-Heuristic and Machine
[16] Canadian Institute for Cybersecurity (CIC), “CSE- Learning Algorithms-Design thinking approach," 2023
CIC-IDS2018 on AWS.” International Conference on Sustainable Computing and

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3854
Smart Systems (ICSCSS), Coimbatore, India, 2023, pp. Mashaal, “Feature selection for intrusion detection
1471-1479, doi: systems,” in 2020 13th International Symposium on
10.1109/ICSCSS57650.2023.10169845. Computational Intelligence and Design (ISCID),
[31] J. Ren, J. Guo, W. Qian, H. Yuan, X. Hao, and H. Hangzhou, China, Dec. 2020, pp. 265–269. doi:
Jingjing, “Building an Effective Intrusion Detection 10.1109/ISCID51228.2020.00065.
System by Using Hybrid Data Optimization Based on [42] Fang, Weijian & Tan, Xiaoling & Wilbur,
Machine Learning Algorithms,” Security and Dominic. (2020). Application of intrusion detection
Communication Networks, vol. 2019, pp. 1–11, Jun. technology in network safety based on machine learning.
2019, doi: 10.1155/2019/7130868. Safety Science. 124. 104604.
[32] K. -A. Tait et al., "Intrusion Detection using 10.1016/j.ssci.2020.104604.
Machine Learning Techniques: An Experimental [43] P. V. Pandit, S. Bhushan and P. V. Waje,
Comparison," 2021 International Congress of Advanced "Implementation of Intrusion Detection System Using
Technology and Engineering (ICOTEN), Taiz, Yemen, Various Machine Learning Approaches with Ensemble
2021, pp. 1-10, doi: learning," 2023 International Conference on
10.1109/ICOTEN52080.2021.9493543. Advancement in Computation & Computer
[33] Brao, Bobba & Swathi, Kailasam. (2017). Fast Technologies (InCACCT), Gharuan, India, 2023, pp.
kNN Classifiers for Network Intrusion Detection 468-472, doi:
System. Indian Journal of Science and Technology. 10. 10.1109/InCACCT57535.2023.10141704.
1-10. 10.17485/ijst/2017/v10i14/93690. [44] Anouar Bachar, N. E. (2020). ML for Network
[34] Farnaaz, Nabila & Akhil, Jabbar. (2016). Random Intrusion Detection Based on SVM Binary
Forest Modeling for Network Intrusion Detection Classification Model. Advances in Science, Technology
System. Procedia Computer Science. 89. 213-217. and Engineering Systems Journal, 638-644.
10.1016/j.procs.2016.06.047. [45] Nitu Dash, S. C. (2018). Intrusion Detection
[35] Lin, Wei-Chao & Ke, Shih-Wen & Tsai, Chih- System Based on Principal Component Analysis and
Fong. (2015). CANN: An Intrusion Detection System ML Techniques. International Journal of Engineering
Based on Combining Cluster Centers and Nearest Development and Research, 359-367.
Neighbors. Knowledge-Based Systems. 78. [46] Kumar, G. K. (2021). Analysis of ML Algorithms
10.1016/j.knosys.2015.01.009. with Feature Selection for Intrusion Detection Using
[36] G. Yedukondalu, G. H. Bindu, J. Pavan, G. Unsw-Nb15 Dataset. International Journal of Network
Venkatesh and A. SaiTeja, "Intrusion Detection System Security & Its Applications (IJNSA) Vol.13, No.1,
Framework Using Machine Learning," 2021 Third January 2021.
International Conference on Inventive Research in [47] Khammassi, Chaouki & Krichen, Saoussen.
Computing Applications (ICIRCA), Coimbatore, India, (2017). A GA-LR Wrapper Approach for Feature
2021, pp. 1224-1230, doi: Selection in Network Intrusion Detection. Computers &
10.1109/ICIRCA51532.2021.9544717. Security. 70. 10.1016/j.cose.2017.06.005.
[37] S. Waskle, L. Parashar, and U. Singh, “Intrusion [48] G. Madhukar, G. N. (2019). An Intruder
Detection System Using PCA with Random Forest Detection System based on Feature Selection using RF
Approach,” in 2020 International Conference on Algorithm. International Journal of Engineering and
Electronics and Sustainable Communication Systems Advanced Technology (IJEAT).
(ICESC), Coimbatore, India, Jul. 2020, pp. 803–808. [49] B. S. Bhati and C. S. Rai, ‘‘Analysis of support
doi: 10.1109/ICESC48915.2020.9155656 vector machine-based intrusion detection techniques,’’
[38] M. Hammad, W. El-many, and Y. Ismail, Arabian J. Sci. Eng., vol. 45, no. 4, pp. 2371–2383, Apr.
“Intrusion Detection System using Feature Selection 2020.
With Clustering and Classification Machine Learning [50] Senthilnayaki, B., Venkatalakshmi, K., &
Algorithms on the UNSW-NB15 dataset,” in 2020 Kannan, A. (2019). Intrusion detection system using set
International Conference on Innovation and Intelligence feature selection and modified KNN classifier. Int. Arab
for Informatics, Computing, and Technologies (3ICT), J. Inf. Technol., 16(4), 746-753.
Sakheer, Bahrain, Dec. 2020, pp. 1–6. doi: [51] Chakrawarti, A. ., & Shrivastava, S. S. . (2024).
10.1109/3ICT51146.2020.9312002. Enhancing Intrusion Detection System using Deep Q-
[39] A. Singhal, A. Maan, D. Chaudhary and D. Network Approaches based on Reinforcement Learning.
Vishwakarma, "A Hybrid Machine Learning and Data International Journal of Intelligent Systems and
Mining Based Approach to Network Intrusion Applications in Engineering, 12(12s), 34–45.
Detection," 2021 International Conference on Artificial [52] Soucy, P.; Mineau, G.W. A simple KNN
Intelligence and Smart Systems (ICAIS), Coimbatore, algorithm for text categorization. In Proceedings of the
India, 2021, pp. 312-318, doi: 2001 IEEE International Conference on Data Mining,
10.1109/ICAIS50930.2021.9395918. San Jose, CA, USA, 29 November–2 December 2001;
[40] J. D. S. W.S. and P. B., "Machine Learning based pp. 647–648.
Intrusion Detection Framework using Recursive Feature [53] M. Kumar, M. Hanumanthappa and T. V. S.
Elimination Method," 2020 International Conference on Kumar, "Intrusion Detection System using decision tree
System, Computation, Automation and Networking algorithm," 2012 IEEE 14th International Conference on
(ICSCAN), Pondicherry, India, 2020, pp. 1-4, doi: Communication Technology, Chengdu, China, 2012,
10.1109/ICSCAN49426.2020.9262282. pp. 629-634, doi: 10.1109/ICCT.2012.6511281.
[41] F. Kamalov, S. Moussa, R. Zgheib, and O. [54] Belavagi, M. C., &Muniyal, B. (2016)

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3855
“Performance evaluation of supervised machine [67] Thaseen, Sumaiya & Cherukuri, Aswani Kumar.
learning algorithms for intrusion detection.” Procedia (2016). Intrusion Detection Model Using a fusion of
Computer Science 89(1): 117-123. Chi-square feature selection and multi-class SVM.
[55] Ms.Nivedita Naidu, Dr.R.V.Dharaskar “An Journal of King Saud University - Computer and
effective approach to network intrusion detection system Information Sciences. 29. 10.1016/j.jksuci.2015.12.004.
using genetic algorithm”, International Journal of [68] Safura A. Mashayak, Balaji R. Bombade, (2019).
Computer Applications (0975 – 8887) Volume 1 – No. Network Intrusion Detection Exploitation Machine
2, 2010. Learning Strategies with the Utilization of Feature
[56] Dini P, Elhanashi A, Begni A, Saponara S, Zheng Elimination Mechanism. International Journal of
Q, Gasmi K. Overview on ]Intrusion Detection Systems Computer Sciences and Engineering, 7(5), 1292-1300
Design Exploiting Machine Learning for Networking [69] Bhavani T. T, Kameswara M. R and Manohar A.
Cybersecurity. Applied Sciences. 2023; 13(13):7507. R. (2020). Network Intrusion Detection System using
]https://doi.org/10.3390/app13137507. Random Forest and Decision Tree Machine Learning
[57] S. S. Haykin. Neural networks and learning Techniques. International Conference on Sustainable
machines, volume 3. Pearson Upper Saddle River, NJ, Technologies for Computational Intelligence (ICSTCI).
USA: 2009 (pp. 637-643). Springer.
[58] Chakrawarti, A. ., & Shrivastava, S. S. . (2024). [70] Ponthapalli R. et al. (2020). Implementation of
Enhancing Intrusion Detection System using Deep Q- Machine Learning Algorithms for Detection of Network
Network Approaches based on Reinforcement Learning. Intrusion. International Journal of Computer Science
International Journal of Intelligent Systems and Trends and Technology (IJCST). (163-169).
Applications in Engineering, 12(12s), 34–45. [71] Lin, Wei-Chao & Ke, Shih-Wen & Tsai, Chih-
[59] M. Almseidin, M. Alzubi, S. Kovacs and M. Fong. (2015). CANN: An Intrusion Detection System
Alkasassbeh, "Evaluation of machine learning Based on Combining Cluster Centers and Nearest
algorithms for an intrusion detection system," 2017 Neighbors. Knowledge-Based Systems. 78.
IEEE 15th International Symposium on Intelligent 10.1016/j.knosys.2015.01.009.
Systems and Informatics (SISY), Subotica, Serbia, [72] A. Aziz, Amira & Hanafi, Sanaa & Hassanien,
2017, pp. 000277-000282, doi: Aboul Ella. (2016). Comparison of classification
10.1109/SISY.2017.8080566. techniques applied for network intrusion detection and
[60] J. Manjula C. Belavagi and Balachandra Muniyal, classification. Journal of Applied Logic. 24.
“Performance Evaluation of Supervised Machine 10.1016/j.jal.2016.11.018
Learning Algorithms for Intrusion Detection” Twelfth [73] Alkasassbeh and Almseidin. (2018). Machine
International Multi-Conference on Information Learning Methods for Network Intrusions. International
Processing- 2016. Conference on Computing, Communication (ICCCNT).
[61] Saranya, T., Sridevi, S., Deisy, C., Chung, T. D., Arxiv.
& Khan, M. A. (2020). Performance analysis of machine [74] Rasane, Komal & Bewoor, Laxmi & Meshram,
learning algorithms in intrusion detection system: A Vishal. (2019). A Comparative Analysis of Intrusion
review. Procedia Computer Science, 171, 1251-1260. Detection Techniques: Machine Learning Approach.
[62] Mahmood, R. A. R. ., Abdi, A., & Hussin, M. . SSRN Electronic Journal. 10.2139/ssrn.3418748.
(2021). Performance Evaluation of Intrusion Detection [75] Anwer, H.M., Farouk, M., & Abdel-Hamid, A.A.
System using Selected Features and Machine Learning (2018). A framework for efficient network anomaly
Classifiers. Baghdad Science Journal, 18(2(Suppl.), intrusion detection with features selection. 2018 9th
0884. International Conference on Information and
https://doi.org/10.21123/bsj.2021.18.2(Suppl.).0884. Communication Systems (ICICS), 157-162.
[63] M. Choubisa, R. Doshi, N. Khatri and K. Kant [76] Ravale, Ujwala & Marathe, Nilesh & Padiya,
Hiran, "A Simple and Robust Approach of Random Puja. (2015). Feature Selection Based Hybrid Anomaly
Forest for Intrusion Detection System in Cyber Intrusion Detection System Using K Means and RBF
Security," 2022 International Conference on IoT and Kernel Function. Procedia Computer Science. 45. 428-
Blockchain Technology (ICIBT), Ranchi, India, 2022, 435. 10.1016/j.procs.2015.03.174.
pp. 1-5, doi: 10.1109/ICIBT52874.2022.9807766. [77] Kotpalliwar MV, Wajgi R (2015) Classification of
[64] Mohammadi, Sara & Mirvaziri, H. & Ghazizadeh- attacks using support vector machine (SVM) on KDD
Ahsaee, Mostafa & Karimipour, Hadis. (2019). Cyber cup’99 IDS database. In: 2015 Fifth international
intrusion detection by combined feature selection conference on communication systems and network
algorithm. Journal of Information Security and technologies. IEEE, pp 987–990
Applications. 44. 80-88. 10.1016/j.jisa.2018.11.007. [78] A, Anish Halimaa; Sundarakantham, K. (2019).
[65] W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. [IEEE 2019 3rd International Conference on Trends in
Nazri, “Multi-level hybrid support vector machine and Electronics and Informatics (ICOEI) - Tirunelveli, India
extreme learning machine based on modified K-means (2019.4.23-2019.4.25)] 2019 3rd International
for an intrusion detection system,” Expert Systems with Conference on Trends in Electronics and Informatics
Applications, vol. 67, pp. 296–303, 2017. (ICOEI) - Machine Learning Based Intrusion Detection
[66] Chua, T.-H.; Salam, I. Evaluation of Machine System, 916–920. doi:10.1109/ICOEI.2019.8862784 .
Learning Algorithms in Network-Based Intrusion [79] Basheri, Mohammad & Iqbal, Javed & Raheem,
Detection Using Progressive Dataset. Symmetry 2023, A.. (2018). Performance Comparison of Support Vector
15, 1251. https://doi.org/10.3390/sym15061251. Machine, Random Forest, and Extreme Learning

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3856
Machine for Intrusion Detection. IEEE Access. PP. 1-1. 10.1109/NOMS.2018.8406212.
10.1109/ACCESS.2018.2841987. [91] Mazarbhuiya, Fokrul & Shenify, Mohamed.
[80] S. Teng, N. Wu, H. Zhu, L. Teng, and W. Zhang, (2023). An Intuitionistic Fuzzy-Rough Set-Based
"SVM-DT-based adaptive and collaborative intrusion Classification for Anomaly Detection. Applied Sciences
detection," in IEEE/CAA Journal of Automatica Sinica, (2023). 13. 1-21. 10.3390/app13095578.
vol. 5, no. 1, pp. 108-118, Jan. 2018, doi: [92] Shen Kejia, Hamid Parvin, Sultan Noman Qasem,
10.1109/JAS.2017.7510730. Bui Anh Tuan, and Kim-Hung Pho. 2020. A
[81] D. Gupta, S. Singhal, S. Malik, and A. Intrusion classification model based on SVM and fuzzy rough set
detection system using data mining a review, "Network for network intrusion detection. J. Intell. Fuzzy Syst. 39,
intrusion detection system using various data mining 5 (2020), 6801–6817. https://doi.org/10.3233/JIFS-
techniques," 2016 International Conference on Research 191621.
Advances in Integrated Navigation Systems (RAINS), [93] Sever, Hayri & Raoof Nasser, Ahmed. (2019).
Bangalore, India, 2016, pp. 1-6, doi: Host-based intrusion detection architecture based on
10.1109/RAINS.2016.7764418. rough set theory and machine learning. Journal of
[82] K. Goeschel, “Reducing false positives in Engineering and Applied Sciences. 14. 415-422.
intrusion detection systems using data-mining 10.3923/jeasci.2019.415.422.
techniques utilizing support vector machines, decision [94] Zhang, Qiangyi & Qu, Yanpeng & Deng,
trees, and naive Bayes for off-line analysis,” Ansheng. (2018). Network Intrusion Detection Using
SoutheastCon 2016: IEEE, pp. 1–6, 2016. Kernel-based Fuzzy-rough Feature Selection. 1-6.
[83] S., Kumar, K., & Bhatnagar, V. (2021). Machine 10.1109/FUZZ-IEEE.2018.8491578.
Learning Algorithms Performance Evaluation for [95] Panigrahi, A., & Patra, M. R. (2016). Fuzzy
Intrusion Detection. Journal of Information Technology Rough Classification Models for Network Intrusion
Management, 13(1), 42-61. doi: Detection. Transactions on Engineering and Computing
10.22059/jitm.2021.80024. Sciences, 4(2), 07.
[84] Aburomman, Abdulla & Reaz, Mamun Bin Ibne. https://doi.org/10.14738/tmlai.42.1882.
(2016). Ensemble binary SVM classifiers based on PCA [96] Kushal Jani, Punit Lalwani, Deepak Upadhyay,
and LDA feature extraction for intrusion detection. 636- M.B. Potdar, Performance Evolution of Machine
640. 10.1109/IMCEC.2016.7867287. Learning Algorithms for Network Intrusion Detection
[85] Al-Jarrah, O. Y., Al-Hammdi, Y., Yoo, P. D., System. International Journal of Computer Engineering
Muhaidat, S., & Al-Qutayri, M. (2018) “Semi- and Technology, 9(5), 2018, pp. 181-189.
supervised multi-layered clustering model for intrusion [97] Kocher, Geeta & Kumar Ahuja, Dr. Gulshan.
detection.” Digital Communications and Networks 4(4): (2021). Analysis of Machine Learning Algorithms with
277-286. Feature Selection for Intrusion Detection using UNSW-
[86] B. M. Irfan, V. Poornima, S. Mohana Kumar, U. NB15 Dataset. International Journal of Network
S. Aswal, N. Krishnamoorthy and R. Maranan, Security & Its Applications. 13. 21-31.
"Machine Learning Algorithms for Intrusion Detection 10.5121/ijnsa.2021.13102.
Performance Evaluation and Comparative Analysis,"
2023 4th International Conference on Smart Electronics
and Communication (ICOSEC), Trichy, India, 2023, pp.
01-05, doi: 10.1109/ICOSEC58147.2023.10275831.
[87] M. D. Rokade and Y. K. Sharma, “MLIDS: A
Machine Learning Approach for Intrusion Detection for
Real-Time Network Dataset,” in 2021 International
Conference on Emerging Smart Computing and
Informatics (ESCI), Pune, India, Mar. 2021, pp. 533–
536. doi: 10.1109/ESCI50559.2021.9396829.
[88] S. Waskle, L. Parashar, and U. Singh, “Intrusion
Detection System Using PCA with Random Forest
Approach,” in 2020 International Conference on
Electronics and Sustainable Communication Systems
(ICESC), Coimbatore, India, Jul. 2020, pp. 803–808.
doi: 10.1109/ICESC48915.2020.9155656.
[89] J. Gao, S. Chai, C. Zhang, B. Zhang and L. Cui,
"A Novel Intrusion Detection System based on Extreme
Machine Learning and Multi-Voting Technology," 2019
Chinese Control Conference (CCC), Guangzhou, China,
2019, pp. 8909-8914, doi:
10.23919/ChiCC.2019.8865258.
[90] M. Zaman and C.-H. Lung, "Evaluation of
machine learning techniques for network intrusion
detection," NOMS 2018 - 2018 IEEE/IFIP Network
Operations and Management Symposium, Taipei,
Taiwan, 2018, pp. 1-5, doi:

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2024, 12(4), 3833–3857 | 3857

You might also like