Skip to content

CTINexus is a novel framework that leverages optimized in-context learning of LLMs to enable data-efficient extraction of cyber threat intelligence and the construction of high-quality cybersecurity knowledge graphs.

License

Notifications You must be signed in to change notification settings

0060lulu/CTINexus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models

License: MIT

The repository of CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models (LLMs) for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction. CTINexus requires neither extensive data nor parameter tuning and can adapt to various ontologies with minimal annotated examples.

framework

News

🌟 [2025/06/14] Community spotlight — Jeff’s fork turns CTINexus into a containerized micro-service PoC with a Gradio UI. Submit text and instantly see the extracted intel and interactive graph!

🔥 [2025/04/21] We released the camera-ready paper on arxiv.

🔥 [2025/02/12] CTINexus is accepted at 2025 IEEE European Symposium on Security and Privacy (Euro S&P).

Introduction

CTINexus composes of the following modules:

  • IE: A carefully designed automatic prompt construction strategy with optimal demonstration retrieval for extracting a wide range of cybersecurity entities and relations;
  • A hierarchical entity alignment technique that canonicalizes the extracted knowledge and removes redundancy;
    • ET: Groups mentions of the same type.
    • EM: Merges mentions referring to the same entity with IOC protection.
  • LP: An long-distance relation prediction technique to further complete the CSKG with missing links.

Quick Start

1. Prerequisites

pip install -r requirements.txt

2. Cybersecurity Triplet Extraction

  1. Update the configuration file. To use the optimal settings, simply insert your OpenAI API key.
  2. Run the following script to perform triplet extraction:
    sh tools/scripts/ie.sh

3. Hierarchical Entity Alignment

3.1 Course-grained Entity Typing

  1. Update the configuration file. To use the optimal settings, simply insert your OpenAI API key.
  2. Run the following script to perform triplet extraction:
    sh tools/scripts/et.sh

3.2 Fine-grained Entity Merging

  1. Update the configuration files (config1, config2). To use the optimal settings, simply insert your OpenAI API key.
  2. Run the following script to perform entity alignment:
    sh tools/scripts/em.sh

4. Long-Distance Relation Prediction

  1. Update the configuration file. To use the optimal settings, simply insert your OpenAI API key.
  2. Run the following script to predict long-distance relations:
    sh tools/scripts/lp.sh

Citation

We hope our work serves as a foundation for further LLM applications in the CTI analysis community. If you find it helpful for your research, please consider citing our paper! ❤️

@inproceedings{cheng2025ctinexusautomaticcyberthreat,
      title={CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models}, 
      author={Yutong Cheng and Osama Bajaber and Saimon Amanuel Tsegai and Dawn Song and Peng Gao},
      booktitle={2025 IEEE European Symposium on Security and Privacy (EuroS\&P)},
      year={2025},
      organization={IEEE}
}

License

The source code is licensed under the MIT License. We warmly welcome industry collaboration. If you’re interested in building on CTINexus or exploring joint initiatives, please email [email protected]—we’d be happy to set up a brief call to discuss ideas.

About

CTINexus is a novel framework that leverages optimized in-context learning of LLMs to enable data-efficient extraction of cyber threat intelligence and the construction of high-quality cybersecurity knowledge graphs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 83.4%
  • Jinja 16.4%
  • Shell 0.2%