Detection of Data Exfiltration via DNS

Overview

Exfiltration of data over DNS and maintaining tunneled command and control communications for malware is one of the critical attacks exploited by cyber-attackers against enterprise networks to fetch valuable and sensitive data from their networks since DNS traffic is allowed to pass through firewalls by default, attackers can encode valuable information in DNS queries without fear of being detected.

Solution

In this project, we introduce a real-time mechanism to detect exfiltration and tunneling of data over DNS through training a machine learning model that is capable of detecting anomalies in DNS queries.

Random Forest

Classification Report

Confusion Matrix

Decision Tree

Classification Report

Confusion Matrix

XGBoost

Classification Report

Confusion Matrix

Experiments

Model Evaluation

The three models have the same accuracy and f1-score for both classes. A reasonable way to select the champion model is based on the number and cost of false negatives and false positives. Our solution should put more weight on false negatives as the cost of classifying a malicious DNS query as benign is more than the cost of classifying a benign DNS query as malicious and therefore we will select the model that has reasonable false negatives and false positives. Based on this criteria our champion model is XGBoost.

Hyperparameter Tuning

We searched for the best hyperparameters using randomized search and obtained a little higher accuracy with the new parameters but also false negatives increased to 53 instead of 25 and so we used the default hyperparameter values instead.

Results

The highest accuracy we could get is 82%, we can increase this accuracy using stateful features in addition to our stateless features. As we can see DNS exfiltration detection is achievable easily using machine learning algorithms having the advantage of real-time fast detection but as everything comes with a cost, we “must” tune our model regularly to keep its accuracy and evaluation metrics high and to enhance the model further.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Kafka Setup Package		Kafka Setup Package
dataset		dataset
docs		docs
images		images
logs		logs
mlruns		mlruns
models		models
notebooks		notebooks
output		output
references		references
reports		reports
results		results
src.egg-info		src.egg-info
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.py		build.py
config.json		config.json
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Detection of Data Exfiltration via DNS

Overview

Solution

Random Forest

Classification Report

Confusion Matrix

Decision Tree

Classification Report

Confusion Matrix

XGBoost

Classification Report

Confusion Matrix

Experiments

Model Evaluation

Hyperparameter Tuning

Results

About

Uh oh!

Releases

Packages

Languages

License

alansary/Detection-of-Data-Exfiltration-via-DNS

Folders and files

Latest commit

History

Repository files navigation

Detection of Data Exfiltration via DNS

Overview

Solution

Random Forest

Classification Report

Confusion Matrix

Decision Tree

Classification Report

Confusion Matrix

XGBoost

Classification Report

Confusion Matrix

Experiments

Model Evaluation

Hyperparameter Tuning

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages