Unlock the hidden structure and insights within your codebase by transforming it into a powerful, queryable knowledge graph!
This project is your gateway to transforming complex source code into structured, queryable knowledge graphs. By leveraging semantic analysis (primarily via VS Code Language Server capabilities and historically through ANTLR), we extract meaningful information about your code's entities, relationships, and architecture. Dive deep into your codebase like never before!
- Knowledge Graph Generation: Converts source code from various languages into a rich graph structure.
- VS Code Integration: Primarily utilizes VS Code's powerful language servers for parsing and symbol extraction, providing broad language support out-of-the-box.
- Language Agnostic (via VS Code): The VS Code module aims to support any language for which a good Language Server Protocol (LSP) implementation exists.
- Detailed Symbol & Relationship Extraction: Captures classes, functions, variables, calls, inheritance, implementations, and more.
- File System Awareness: Includes tools for intelligently walking file trees, respecting
.gitignorepatterns. - MinHashing for Similarity: Implements MinHash for locality-sensitive hashing, useful for detecting near-duplicate code snippets or tracking semantic drift.
- (Historical) ANTLR-based Parsing & Querying: Features a sophisticated, though now deprecated, ANTLR-based parsing pipeline with a custom AST Query Language (
bevel_ast_ql) for fine-grained code analysis.
Understanding large, evolving codebases is a monumental task. This project aims to alleviate that by:
- Deep Code Understanding: Visualize and query relationships between components.
- Impact Analysis: Understand the ripple effects of changes.
- Architectural Insights: Discover design patterns, dependencies, and potential issues.
- Foundation for Custom Tooling: Build linters, documentation generators, or advanced refactoring tools on top of the graph.
- AI & LLM Augmentation: Provide rich, structured context about code to Large Language Models.
The project is organized into several key modules:
-
vscode/(Primary & Active):- Interfaces with VS Code's language services to extract symbols, references, and definitions.
VsCodeParser.kt: Orchestrates symbol extraction from files.VsCodeConnectionParser.kt: Infers connections (calls, inheritance, etc.) from symbols.languageSpecs/: Contains language-specific configurations and heuristics to refine VS Code's output.- This is the recommended and actively developed approach for parsing.
-
antlr/(⚠️ Largely Deprecated & Non-Functional):- Contains ANTLR grammars for various languages (Kotlin, C#, JavaScript).
- Includes a custom AST Query Language (
bevel_ast_ql/) for defining patterns to extract nodes and relationships from ANTLR parse trees. QueryBasedAntlrParser.kt&ConverterBasedAntlrParser.kt: Historical parsers using this system.- Important: This module underwent a large-scale refactor and is no longer functional in its current state. It's preserved for its valuable query language design and potential future reintegration if specific deep-parsing needs arise that VS Code LSP cannot fulfill. The documentation in
docs/primarily refers to this deprecated system.
-
providers/:GitignoreAwareFileWalker.kt: Efficiently traverses project directories, respecting.gitignore.MinHasher.kt: Implements MinHash for code similarity analysis.
-
regex/(⚠️ Deprecated):- Contains older, regex-based parsers for specific frameworks (e.g., AngularJS). Not actively maintained.
-
Root-level Scripts (
combine_*.sh,combine_files.py):- Utilities to package the codebase itself into a single text file. Useful for providing context to LLMs or for archiving.
- File Discovery: The
GitignoreAwareFileWalkerscans the target project for relevant source files. - Symbol Extraction (via VS Code): For each file, the
VsCodeParsercommunicates with VS Code (or a compatible LSP client) to request document symbols, workspace symbols, definitions, and references. - Graph Node Creation: Extracted symbols (classes, functions, variables, etc.) are transformed into nodes in the knowledge graph.
- Relationship Inference: The
VsCodeConnectionParseranalyzes the symbol information (e.g., call hierarchies, type definitions, inheritance) to create connections (edges) between nodes. - Graph Augmentation & Refinement: Language-specific logic in
vscode/languageSpecs/can further refine the graph, adding more detailed connections or node properties. - (Optional) Hashing:
MinHashercan be used to generate semantic fingerprints of code blocks or files.
The code-to-knowledge-graph project serves as the foundational engine that powers a comprehensive ecosystem of developer tools designed to unlock insights from complex codebases. Here are the top use cases and tools that leverage this knowledge graph technology:
Tool: Bevel Neo4j Visualization
Transform your codebase knowledge graph into stunning, interactive visualizations within VS Code.
Tool: Bevel Test Generator
Leverage the knowledge graph to create comprehensive test prompts for AI coding assistants.
Tool: Bevel Software Extension
Generate interactive sequence diagrams and call graphs directly from your codebase analysis.
Tool: Direct Integration with Knowledge Graph API
Build your own analysis tools using the knowledge graph foundation.
- 🏗️ Solid Foundation: The knowledge graph provides a consistent, queryable representation of code structure
- 🔧 Specialized Tools: Each tool focuses on specific use cases while sharing the same data foundation
- 🤝 Seamless Integration: Tools work together, with each providing unique value
- 🌱 Extensible: Build custom solutions on top of the knowledge graph API
- 📈 Scalable: From individual functions to enterprise codebases
Whether you're dealing with legacy systems, complex architectures, or simply want to understand your code better, this ecosystem provides the tools to transform raw source code into actionable insights and beautiful visualizations.
The easiest way to get started is by using the pre-built VS Code extensions that leverage this knowledge graph technology:
- Install the main Bevel Extension from the VS Code marketplace
- Open your codebase in VS Code and run
Bevel: Re-/Analyze Projectfrom the Command Palette - Explore with additional tools:
- Install Bevel Neo4j Visualization for graph exploration
- Install Bevel Test Generator for AI-assisted testing
This approach gives you immediate access to the knowledge graph capabilities without needing to build from source.
If you want to build custom tools or contribute to the core engine:
- Java Development Kit (JDK) 17 or higher
- Gradle (the project uses the Gradle wrapper, so it will be downloaded automatically)
- (For
vscodemodule functionality) A running instance of VS Code or a compatible LSP server setup that the tool can communicate with (details depend on the specific runner implementation). - Python 3 for the
combine_*.shscripts (pip3 install -r requirements.txt).
-
Clone the repository:
git clone https://github.com/Bevel-Software/code-to-knowledge-graph.git cd code-to-knowledge-graph -
Build the project using Gradle:
./gradlew build
This will compile the Kotlin/Java code, run tests, and produce necessary artifacts.
The primary way to leverage the knowledge graph generation capabilities is by integrating the parsers into your own applications or analysis scripts. The Factories.kt file (in src/main/kotlin/) provides convenient factory methods to instantiate the core components for the vscode module.
Here's a conceptual example of how you might use these factories:
// Example (Conceptual - actual API may vary, check Factories.kt for precise signatures)
import createVsCodeParser
import createVsCodeConnectionParser
// ... other necessary imports from graph_domain, file_system_domain, etc.
fun main() {
val projectPath = "/path/to/your/codebase" // Ensure this project has a .bevel/port file if not providing commsChannel
// 1. Create a VsCodeParser instance using the factory
// This handles setting up dependencies like communication channels, file handlers, etc.
val vsCodeParser = createVsCodeParser(projectPath = projectPath)
// 2. Parse the project to get an initial graph of nodes
// The parseToGraphBuilder method returns a GraphBuilder instance.
val graphBuilder = vsCodeParser.parseToGraphBuilder(listOf(projectPath))
// 3. Optionally, create a VsCodeConnectionParser to infer more connections
val vsCodeConnectionParser = createVsCodeConnectionParser(
projectPath = projectPath,
// languageSpecification and fileHandler might be shared or re-instantiated
// commsChannel can be reused if vsCodeParser created one, or a new one can be made
)
// 4. Build the final graph and then enhance it with more connections
// Note: The exact methods and flow for connection parsing might vary.
// The VsCodeConnectionParser typically operates on a Graphlike object.
var graph = graphBuilder.build(projectPath)
graph = vsCodeConnectionParser.addOutboundConnections(graph)
// graph = vsCodeConnectionParser.addInboundConnections(graph) // Or similar methods
// Now you have 'graph' (a Graphlike object) to work with!
// You can query its nodes and connections.
println("Parsed ${graph.nodes.size} nodes and ${graph.connections.getAllConnections().size} connections.")
}Contributions are welcome! Whether it's improving the VS Code integration, adding new language-specific handlers, enhancing the graph model, or fixing bugs, your help is appreciated.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Please make sure your code adheres to the existing style and that all tests pass.
This project is licensed under the Mozilla Public License Version 2.0. See the LICENSE file for details.
The NOTICE file contains information about licenses of third-party dependencies.
- The ANTLR project for their powerful parser generator (though our ANTLR module is currently deprecated).
- The Dynatrace hash4j library for MinHash implementation.
- The broader Language Server Protocol (LSP) community for enabling robust cross-editor language intelligence.
- All contributors and users of this project.
Happy Coding & Graphing! 🧑💻➡️📊





