tree-sitter

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:

General enough to parse any programming language
Fast enough to parse on every keystroke in a text editor
Robust enough to provide useful results even in the presence of syntax errors
Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application

Links

Corpus Testing

Tree-sitter uses a corpus-based testing approach to validate grammar implementations. This section explains how to use and leverage the corpus testing functionality.

Understanding Corpus Files

Corpus files (.txt) contain test cases with input code and expected parse trees. They follow this format:

===================
SECTION NAME
===================

source_code_example;

---

(expected_parse_tree
  (with_nodes))

Corpus Parsing Implementation

The corpus parsing is implemented in two main files:

scripts/corpus-parser.js - Core parsing logic as a reusable class
scripts/corpus-test.js - Command-line interface for working with corpus files

Using the Corpus Parser in Your Project

To use the corpus parser in your own application:

Copy the corpus-parser.js file to your project
Import and use the parser:

const CorpusParser = require('./path/to/corpus-parser');

async function testGrammar() {
  const parser = new CorpusParser();
  const sections = await parser.parseFile('/path/to/corpus.txt');

  // Process sections and examples
  for (const section of sections) {
    for (const example of section.examples) {
      console.log(`Testing: ${example.source}`);
      // Parse with your grammar and compare to example.tree
    }
  }
}

Command-line Usage

The corpus-test.js script provides several commands to work with corpus files:

# Parse a corpus file and output as JSON
node scripts/corpus-test.js parse path/to/corpus.txt

# Validate a corpus against a grammar
node scripts/corpus-test.js validate path/to/grammar.js path/to/corpus.txt

# Extract all corpus files from a directory
node scripts/corpus-test.js extract path/to/directory

# Print a summary of a corpus file
node scripts/corpus-test.js summary path/to/corpus.txt

Metadata Available

The parser extracts rich metadata from corpus files:

Section information (name, description)
Example details (source code, expected parse tree)
File location information (filepath, line numbers)

This metadata can be used for generating reports, IDE integrations, or custom testing frameworks.

Name		Name	Last commit message	Last commit date
Latest commit History 5,635 Commits
.cargo		.cargo
.github		.github
cli		cli
docs		docs
highlight		highlight
lib		lib
scripts		scripts
tags		tags
test/fixtures		test/fixtures
xtask		xtask
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
FUNDING.json		FUNDING.json
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tree-sitter

Links

Corpus Testing

Understanding Corpus Files

Corpus Parsing Implementation

Using the Corpus Parser in Your Project

Command-line Usage

Metadata Available

About

Uh oh!

Releases

Packages

Languages

License

tamagosante/tree-sitter

Folders and files

Latest commit

History

Repository files navigation

tree-sitter

Links

Corpus Testing

Understanding Corpus Files

Corpus Parsing Implementation

Using the Corpus Parser in Your Project

Command-line Usage

Metadata Available

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages