You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/section-2-using-parsers.md
+35-14Lines changed: 35 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,19 @@ permalink: using-parsers
5
5
6
6
# Using Parsers
7
7
8
-
A Tree-sitter parser consists of a single C source file which exports one function with the naming scheme `tree_sitter_${LANGUAGE_NAME}`. This function returns a pointer to a `TSLanguage` struct, which can be used in conjunction with a `TSParser` to produce a syntax trees.
8
+
All of Tree-sitter's parsing functionality is exposed through C APIs. Applications written in higher-level languages can use Tree-sitter via binding libraries like [node-tree-sitter](https://github.com/tree-sitter/node-tree-sitter) or [rust-tree-sitter](https://github.com/tree-sitter/rust-tree-sitter), which have their own documentation.
9
9
10
-
## The Raw C API
10
+
This document will describes the general concepts of how to use Tree-sitter, which should be relevant regardless of what language you're using. It also goes into some C-specific details that are useful if you're using the C API directly or are building a new binding to a different language.
11
+
12
+
## The Object Model
13
+
14
+
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSParser`, `TSLanguage`, `TSTree`, and `TSNode`.
15
+
* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next section](/creating-parsers) for how to create new languages.
16
+
* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
17
+
* A `TSTree` represents the syntax tree of an entire source code file. Its contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
18
+
* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
19
+
20
+
## An Example Program
11
21
12
22
Here's an example of a simple C program that uses the Tree-sitter [JSON parser](https://github.com/tree-sitter/tree-sitter-json).
13
23
@@ -19,26 +29,37 @@ Here's an example of a simple C program that uses the Tree-sitter [JSON parser](
19
29
#include<stdio.h>
20
30
#include"tree_sitter/runtime.h"
21
31
32
+
// Declare the `tree_sitter_json` function, which is
// Check that the nodes have the expected child counts.
44
65
assert(ts_node_child_count(root_node) == 1);
@@ -50,15 +71,15 @@ int main() {
50
71
char *string = ts_node_string(root_node);
51
72
printf("Syntax tree: %s\n", string);
52
73
53
-
// Free all of the heap allocations.
74
+
// Free all of the heap-allocated memory.
54
75
free(string);
55
76
ts_tree_delete(tree);
56
77
ts_parser_delete(parser);
57
78
return 0;
58
79
}
59
80
```
60
81
61
-
This program uses the Tree-sitter C API, which is declared in the header file `tree_sitter/runtime.h`, so we need to add the `tree_sitter/include` directory to the include path. We also need to link `libruntime.a` into the binary.
82
+
This program uses the Tree-sitter C API, which is declared in the header file `tree_sitter/runtime.h`, so we need to add the `tree_sitter/include` directory to the include path. We also need to link `libruntime.a` into the binary. We compile the source code of the JSON language directly into the binary as well.
62
83
63
84
```sh
64
85
clang \
@@ -71,11 +92,11 @@ clang \
71
92
./test-json-parser
72
93
```
73
94
74
-
###Providing the text to parse
95
+
## Providing the text to parse
75
96
76
97
Text input is provided to a tree-sitter parser via a `TSInput` struct, which specifies a function pointer for reading chunks of text. The text can be encoded in either UTF8 or UTF16. This interface allows you to efficiently parse text that is stored in your own data structure.
77
98
78
-
###Querying the syntax tree
99
+
## Querying the syntax tree
79
100
80
101
Tree-sitter provides a DOM-style interface for inspecting syntax trees. Functions like `ts_node_child(node, index)` and `ts_node_next_sibling(node)` expose every node in the concrete syntax tree. This is useful for operations like syntax-highlighting, which operate on a token-by-token basis. You can also traverse the tree in a more abstract way by using functions like
81
102
`ts_node_named_child(node, index)` and `ts_node_next_named_sibling(node)`. These functions don't expose nodes that were specified in the grammar as anonymous tokens, like `:` and `{`. This is useful when analyzing the meaning of a document.
0 commit comments