GitHub - Hamstring-NDR/hamstring

heiDGAF - Domain Generation Algorithms Finder

Machine learning-based DNS classifier for detecting Domain Generation Algorithms (DGAs), tunneling, and data exfiltration by malicious actors.
Explore the docs »

Report Bug · Request Feature

Caution

The project is under active development right now. Everything might change, break, or move around quickly.

Continuous Integration

About the Project

Getting Started

Run heiDGAF using Docker Compose:

HOST_IP=127.0.0.1 docker compose -f docker/docker-compose.yml up

Or run the modules locally on your machine:

python -m venv .venv
source .venv/bin/activate

sh install_requirements.sh

Alternatively, you can use pip install and enter all needed requirements individually with -r requirements.*.txt.

Now, you can start each stage, e.g. the inspector:

python src/inspector/inspector.py

(back to top)

Usage

Configuration

To configure heiDGAF according to your needs, use the provided config.yaml.

The most relevant settings are related to your specific log line format, the model you want to use, and possibly infrastructure.

The section pipeline.log_collection.collector.logline_format has to be adjusted to reflect your specific input log line format. Using our adjustable and flexible log line configuration, you can rename, reorder and fully configure each field of a valid log line. Freely define timestamps, RegEx patterns, lists, and IP addresses. For example, your configuration might look as follows:

- [ "timestamp", Timestamp, "%Y-%m-%dT%H:%M:%S.%fZ" ]
- [ "status_code", ListItem, [ "NOERROR", "NXDOMAIN" ], [ "NXDOMAIN" ] ]
- [ "client_ip", IpAddress ]
- [ "dns_server_ip", IpAddress ]
- [ "domain_name", RegEx, '^(?=.{1,253}$)((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,63}$' ]
- [ "record_type", ListItem, [ "A", "AAAA" ] ]
- [ "response_ip", IpAddress ]
- [ "size", RegEx, '^\d+b$' ]

The options pipeline.data_inspection and pipeline.data_analysis are relevant for configuring the model. The section environment can be fine-tuned to prevent naming collisions for Kafka topics and adjust addressing in your environment.

For more in-depth information on your options, have a look at our official documentation, where we provide tables explaining all values in detail.

Monitoring

To monitor the system and observe its real-time behavior, multiple Grafana dashboards have been set up.

Have a look at the following pictures showing examples of how these dashboards might look at runtime.

Overview dashboard

Contains the most relevant information on the system's runtime behavior, its efficiency and its effectivity.

Latencies dashboard

Presents any information on latencies, including comparisons between the modules and more detailed, stand-alone metrics.

Log Volumes dashboard

Presents any information on the fill levels of each module, i.e. the number of entries that are currently in the module for processing. Includes comparisons between the modules, more detailed, stand-alone metrics, as well as total numbers of logs entering the pipeline or being marked as fully processed.

Alerts dashboard

Presents details on the number of logs detected as malicious including IP addresses responsible for those alerts.

Dataset dashboard

This dashboard is only active for the datatest mode. Users who want to test their own models can use this mode for inspecting confusion matrices on testing data.

This feature is in a very early development stage.

(back to top)

Models and Training

To train and test our and possibly your own models, we currently rely on the following datasets:

We compute all features separately and only rely on the domain and class for binary classification.

Inserting Data for Testing

For testing purposes, we provide multiple scripts in the scripts directory. Use real_logs.dev.py to send data from the datasets into the pipeline. After downloading the dataset and storing it under <project-root>/data, run

python scripts/real_logs.dev.py

to start continuously inserting dataset traffic.

Training Your Own Models

Important

This is only a brief wrap-up of a custom training process. We highly encourage you to have a look at the documentation for a full description and explanation of the configuration parameters.

We feature two trained models:

XGBoost (src/train/model.py#XGBoostModel) and
RandomForest (src/train/model.py#RandomForestModel).

After installing the requirements, use src/train/train.py:

> python -m venv .venv
> source .venv/bin/activate

> pip install -r requirements/requirements.train.txt

> python src/train/train.py
Usage: train.py [OPTIONS] COMMAND [ARGS]...

Options:
  -h, --help  Show this message and exit.

Commands:
  explain
  test
  train

Setting up the dataset directories (and adding the code for your model class if applicable) lets you start the training process by running the following commands:

Model Training

> python src/train/train.py train  --dataset <dataset_type> --dataset_path <path/to/your/datasets> --model <model_name>

The results will be saved per default to ./results, if not configured otherwise.

Model Tests

> python src/train/train.py test  --dataset <dataset_type> --dataset_path <path/to/your/datasets> --model <model_name> --model_path <path_to_model_version>

Model Explain

> python src/train/train.py explain  --dataset <dataset_type> --dataset_path <path/to/your/datasets> --model <model_name> --model_path <path_to_model_version>

This will create a rules.txt file containing the innards of the model, explaining the rules it created.

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Top contributors:

(back to top)

License

Distributed under the EUPL License. See LICENSE.txt for more information.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 1,058 Commits
.github		.github
assets		assets
docker		docker
docs		docs
requirements		requirements
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
generate-env.sh		generate-env.sh
install_requirements.sh		install_requirements.sh
setup.cfg		setup.cfg
train_models.sh		train_models.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

heiDGAF - Domain Generation Algorithms Finder

About the Project

Getting Started

Run heiDGAF using Docker Compose:

Or run the modules locally on your machine:

Usage

Configuration

Monitoring

Models and Training

Inserting Data for Testing

Training Your Own Models

Model Training

Model Tests

Model Explain

Contributing

Top contributors:

License

About

Uh oh!

Releases

Packages

Languages

License

Hamstring-NDR/hamstring

Folders and files

Latest commit

History

Repository files navigation

heiDGAF - Domain Generation Algorithms Finder

About the Project

Getting Started

Run heiDGAF using Docker Compose:

Or run the modules locally on your machine:

Usage

Configuration

Monitoring

Models and Training

Inserting Data for Testing

Training Your Own Models

Model Training

Model Tests

Model Explain

Contributing

Top contributors:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages