Skip to content

Commit 87ad0fb

Browse files
committed
Expand using parsers document
1 parent a8bcd2c commit 87ad0fb

File tree

1 file changed

+35
-14
lines changed

1 file changed

+35
-14
lines changed

docs/section-2-using-parsers.md

Lines changed: 35 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,19 @@ permalink: using-parsers
55

66
# Using Parsers
77

8-
A Tree-sitter parser consists of a single C source file which exports one function with the naming scheme `tree_sitter_${LANGUAGE_NAME}`. This function returns a pointer to a `TSLanguage` struct, which can be used in conjunction with a `TSParser` to produce a syntax trees.
8+
All of Tree-sitter's parsing functionality is exposed through C APIs. Applications written in higher-level languages can use Tree-sitter via binding libraries like [node-tree-sitter](https://github.com/tree-sitter/node-tree-sitter) or [rust-tree-sitter](https://github.com/tree-sitter/rust-tree-sitter), which have their own documentation.
99

10-
## The Raw C API
10+
This document will describes the general concepts of how to use Tree-sitter, which should be relevant regardless of what language you're using. It also goes into some C-specific details that are useful if you're using the C API directly or are building a new binding to a different language.
11+
12+
## The Object Model
13+
14+
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSParser`, `TSLanguage`, `TSTree`, and `TSNode`.
15+
* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next section](/creating-parsers) for how to create new languages.
16+
* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
17+
* A `TSTree` represents the syntax tree of an entire source code file. Its contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
18+
* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
19+
20+
## An Example Program
1121

1222
Here's an example of a simple C program that uses the Tree-sitter [JSON parser](https://github.com/tree-sitter/tree-sitter-json).
1323

@@ -19,26 +29,37 @@ Here's an example of a simple C program that uses the Tree-sitter [JSON parser](
1929
#include <stdio.h>
2030
#include "tree_sitter/runtime.h"
2131

32+
// Declare the `tree_sitter_json` function, which is
33+
// implemented by the `tree-sitter-json` library.
2234
TSLanguage *tree_sitter_json();
2335

2436
int main() {
25-
// Create a parser with the JSON language.
37+
// Create a parser.
2638
TSParser *parser = ts_parser_new();
39+
40+
// Set the parser's language (JSON in this case).
2741
ts_parser_set_language(parser, tree_sitter_json());
2842

29-
// Parse some source code.
43+
// Build a syntax tree based on source code stored in a string.
3044
const char *source_code = "[1, null]";
31-
TSTree *tree = ts_parser_parse_string(parser, NULL, source_code, strlen(source_code));
32-
33-
// Find some syntax tree nodes.
45+
TSTree *tree = ts_parser_parse_string(
46+
parser,
47+
NULL,
48+
source_code,
49+
strlen(source_code)
50+
);
51+
52+
// Get the root node of the syntax tree.
3453
TSNode root_node = ts_tree_root_node(tree);
54+
55+
// Get some child nodes.
3556
TSNode array_node = ts_node_named_child(root_node, 0);
3657
TSNode number_node = ts_node_named_child(array_node, 0);
3758

3859
// Check that the nodes have the expected types.
39-
assert(!strcmp(ts_node_type(root_node), "value"));
40-
assert(!strcmp(ts_node_type(array_node), "array"));
41-
assert(!strcmp(ts_node_type(number_node), "number"));
60+
assert(strcmp(ts_node_type(root_node), "value") == 0);
61+
assert(strcmp(ts_node_type(array_node), "array") == 0);
62+
assert(strcmp(ts_node_type(number_node), "number") == 0);
4263

4364
// Check that the nodes have the expected child counts.
4465
assert(ts_node_child_count(root_node) == 1);
@@ -50,15 +71,15 @@ int main() {
5071
char *string = ts_node_string(root_node);
5172
printf("Syntax tree: %s\n", string);
5273

53-
// Free all of the heap allocations.
74+
// Free all of the heap-allocated memory.
5475
free(string);
5576
ts_tree_delete(tree);
5677
ts_parser_delete(parser);
5778
return 0;
5879
}
5980
```
6081

61-
This program uses the Tree-sitter C API, which is declared in the header file `tree_sitter/runtime.h`, so we need to add the `tree_sitter/include` directory to the include path. We also need to link `libruntime.a` into the binary.
82+
This program uses the Tree-sitter C API, which is declared in the header file `tree_sitter/runtime.h`, so we need to add the `tree_sitter/include` directory to the include path. We also need to link `libruntime.a` into the binary. We compile the source code of the JSON language directly into the binary as well.
6283

6384
```sh
6485
clang \
@@ -71,11 +92,11 @@ clang \
7192
./test-json-parser
7293
```
7394

74-
### Providing the text to parse
95+
## Providing the text to parse
7596

7697
Text input is provided to a tree-sitter parser via a `TSInput` struct, which specifies a function pointer for reading chunks of text. The text can be encoded in either UTF8 or UTF16. This interface allows you to efficiently parse text that is stored in your own data structure.
7798

78-
### Querying the syntax tree
99+
## Querying the syntax tree
79100

80101
Tree-sitter provides a DOM-style interface for inspecting syntax trees. Functions like `ts_node_child(node, index)` and `ts_node_next_sibling(node)` expose every node in the concrete syntax tree. This is useful for operations like syntax-highlighting, which operate on a token-by-token basis. You can also traverse the tree in a more abstract way by using functions like
81102
`ts_node_named_child(node, index)` and `ts_node_next_named_sibling(node)`. These functions don't expose nodes that were specified in the grammar as anonymous tokens, like `:` and `{`. This is useful when analyzing the meaning of a document.

0 commit comments

Comments
 (0)