B.E. Project: Department of Computer Engineering and Information Technology
B.E. Project: Department of Computer Engineering and Information Technology
Application
To,
The Project co-ordinator,
Computer Engineering Department,
PVG’s COET, Pune.
Sub:- Submission of Project Synopsis.
Respected Sir,
We undersigned, students of B.E. Computer are submitting our project Synopsis. We are
bound to the decision taken by department related to our selected project title and are submitting
the final synopsis for selected project. Henceforth we will not change the project group or the
selected project title/topic due to any reason.
Thanking you.
Group Id:- 4
Abstract:-
As the internet grows, cybersecurity problems also arise with it. Different malicious activities are
being carried out by the attackers so that they will be able to get the information of the victim.
Using this information the attackers performs their illegal activities. The development of
applications to mitigate those threats present some complicating factors such as the growth in the
amount of data, and the variety of data, that can come from different sources. In this project we
design an architecture which is being built on the top of the Big Data frameworks that aims to
mitigate the cyber security problem like phishing. In this project we introduced an architecture
that enables the implementation of big data applications to be used in the context of cyber
security. It is being designed such that we are able to detect the phishing emails in a large data
set and the information collected by the honeypot.
Introduction:-
Security issues become more critical due to factors such as the large volumes and variety of data
that may be vulnerable, the diversity of data sources and formats, and the velocity in which data
are generated, typically following a stream nature with a high volume. Enterprises usually
collect terabytes of security-relevant data, including network traffic, and software application
events, among others. However, well established techniques, most of the time, are not scalable
and typically produce many false positives when dealing with large amounts of data, degrading
their efficacy. To face these emerging problems, big data analytics has attracted the interest of the
security community. The use of big data frameworks for security solutions presents several
benefits, such as the possibility of storing and using large quantities of security data. Although
analyzing logs, network flows, and system events has been used for several decades in security
solutions, conventional technologies are not adequate to be applied on such long term, large-
scale volumes. In general, the traditional infrastructure keep the data only for a limited period.
Besides that, traditional techniques are inefficient when performing analytics and complex
queries on large, unstructured datasets, while big data platforms perform these operations
efficiently. In this paper we present an architecture for cybersecurity applications based on big
data frameworks. Our architecture has the capability of collecting data from different sources,
storing, combining, and processing them effectively. For example, sources like pcap files and
other logs from a honeypot, data streams collected from black list sites can all be stored in
our system.
B. Class of problem
When solving problems we have to decide the difficulty level of our problem. There are three
types of classes provided for that. These are as follows:
1) P Class
2) NP-Hard Class
3) NP-Complete Class
A decision problem is in P if there is a known polynomial-time algorithm to get that answer. A
decision problem is in NP if there is a known polynomial-time algorithm for a non-deterministic
machine to get the answer. Problems known to be in P are trivially in NP the nondeterministic
machine just never troubles itself to fork another process, and acts just like a deterministic one.
But there are some problems which are known to be in NP for which no poly-time deterministic
algorithm is known; in other words, we know they‘re in NP, but don‘t know if they‘re in P. A
problem is NP-complete if you can prove that (1) it‘s in NP, and (2) show that it‘s poly-time
reducible to a problem already known to be NP-complete. A problem is NP-hard if and only if
it‘s at least as hard as an NP-complete problem. The more conventional Traveling Salesman
Problem of finding the shortest route is NP-hard, not strictly NP-complete.
I = Input = {Emails, pcap files and other logs from a honeypot, data streams collected from black
list sites}
O = Output = {Successfully detection of spam email}
S = Success = { Detection of spam email }
F = Failure = { Detection of spam email is fail, connection loss}
System requirements:-
1.Hardware Requirement
- LAN cable
2. Software Requirement
-ubuntu OS, Python, Java, Virtually installed Honeypot
Expected result:-
We expect that the project designed should be able to detect the phishing emails.
Plan of project execution: