You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
7
10
8
11
-**General** enough to parse any programming language
The Tree-sitter CLI allows you to develop, test, and use Tree-sitter grammars from the command line. It works on MacOS, Linux, and Windows.
7
11
@@ -19,7 +23,7 @@ or with `npm`:
19
23
npm install tree-sitter-cli
20
24
```
21
25
22
-
You can also download a pre-built binary for your platform from [the releases page](https://github.com/tree-sitter/tree-sitter/releases/latest).
26
+
You can also download a pre-built binary for your platform from [the releases page].
23
27
24
28
### Dependencies
25
29
@@ -30,8 +34,11 @@ The `tree-sitter` binary itself has no dependencies, but specific commands have
30
34
31
35
### Commands
32
36
33
-
*`generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation](https://tree-sitter.github.io/tree-sitter/creating-parsers) for more information.
37
+
*`generate` - The `tree-sitter generate` command will generate a Tree-sitter parser based on the grammar in the current working directory. See [the documentation] for more information.
34
38
35
-
*`test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation](https://tree-sitter.github.io/tree-sitter/creating-parsers) for more information.
39
+
*`test` - The `tree-sitter test` command will run the unit tests for the Tree-sitter parser in the current working directory. See [the documentation] for more information.
36
40
37
41
*`parse` - The `tree-sitter parse` command will parse a file (or list of files) using Tree-sitter parsers.
Copy file name to clipboardExpand all lines: docs/section-2-using-parsers.md
+36-36Lines changed: 36 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,21 +21,21 @@ Alternatively, you can incorporate the library in a larger project's build syste
21
21
22
22
**source file:**
23
23
24
-
-`tree-sitter/lib/src/lib.c`
24
+
*`tree-sitter/lib/src/lib.c`
25
25
26
26
**include directories:**
27
27
28
-
-`tree-sitter/lib/src`
29
-
-`tree-sitter/lib/include`
28
+
*`tree-sitter/lib/src`
29
+
*`tree-sitter/lib/include`
30
30
31
31
### The Basic Objects
32
32
33
33
There are four main types of objects involved when using Tree-sitter: languages, parsers, syntax trees, and syntax nodes. In C, these are called `TSLanguage`, `TSParser`, `TSTree`, and `TSNode`.
34
34
35
-
- A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
36
-
- A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
37
-
- A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
38
-
- A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
35
+
* A `TSLanguage` is an opaque object that defines how to parse a particular programming language. The code for each `TSLanguage` is generated by Tree-sitter. Many languages are already available in separate git repositories within the [Tree-sitter GitHub organization](https://github.com/tree-sitter). See [the next page](./creating-parsers) for how to create new languages.
36
+
* A `TSParser` is a stateful object that can be assigned a `TSLanguage` and used to produce a `TSTree` based on some source code.
37
+
* A `TSTree` represents the syntax tree of an entire source code file. It contains `TSNode` instances that indicate the structure of the source code. It can also be edited and used to produce a new `TSTree` in the event that the source code changes.
38
+
* A `TSNode` represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.
39
39
40
40
### An Example Program
41
41
@@ -442,31 +442,31 @@ Many code analysis tasks involve searching for patterns in syntax trees. Tree-si
442
442
443
443
A _query_ consists of one or more _patterns_, where each pattern is an [S-expression](https://en.wikipedia.org/wiki/S-expression) that matches a certain set of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the node's type, and optionally, a series of other S-expressions that match the node's children. For example, this pattern would match any `binary_expression` node whose children are both `number_literal` nodes:
Children can also be omitted. For example, this would match any `binary_expression` where at least _one_ of child is a `string_literal` node:
450
450
451
-
```scheme
451
+
```scheme
452
452
(binary_expression (string_literal))
453
453
```
454
454
455
455
#### Fields
456
456
457
457
In general, it's a good idea to make patterns more specific by specifying [field names](#node-field-names) associated with child nodes. You do this by prefixing a child pattern with a field name followed by a colon. For example, this pattern would match an `assignment_expression` node where the `left` child is a `member_expression` whose `object` is a `call_expression`.
458
458
459
-
```scheme
459
+
```scheme
460
460
(assignment_expression
461
461
left: (member_expression
462
462
object: (call_expression)))
463
463
```
464
464
465
465
#### Negated Fields
466
466
467
-
You can also constrain a pattern so that it only matches nodes that *lack* a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
467
+
You can also constrain a pattern so that it only matches nodes that _lack_ a certain field. To do this, add a field name prefixed by a `!` within the parent pattern. For example, this pattern would match a class declaration with no type parameters:
468
468
469
-
```scheme
469
+
```scheme
470
470
(class_declaration
471
471
name: (identifier) @class_name
472
472
!type_parameters)
@@ -476,7 +476,7 @@ You can also constrain a pattern so that it only matches nodes that *lack* a cer
476
476
477
477
The parenthesized syntax for writing nodes only applies to [named nodes](#named-vs-anonymous-nodes). To match specific anonymous nodes, you write their name between double quotes. For example, this pattern would match any `binary_expression` where the operator is `!=` and the right side is `null`:
478
478
479
-
```scheme
479
+
```scheme
480
480
(binary_expression
481
481
operator: "!="
482
482
right: (null))
@@ -488,15 +488,15 @@ When matching patterns, you may want to process specific nodes within the patter
488
488
489
489
For example, this pattern would match any assignment of a `function` to an `identifier`, and it would associate the name `the-function-name` with the identifier:
490
490
491
-
```scheme
491
+
```scheme
492
492
(assignment_expression
493
493
left: (identifier) @the-function-name
494
494
right: (function))
495
495
```
496
496
497
497
And this pattern would match all method definitions, associating the name `the-method-name` with the method name, `the-class-name` with the containing class name:
498
498
499
-
```scheme
499
+
```scheme
500
500
(class_declaration
501
501
name: (identifier) @the-class-name
502
502
body: (class_body
@@ -510,21 +510,21 @@ You can match a repeating sequence of sibling nodes using the postfix `+` and `*
510
510
511
511
For example, this pattern would match a sequence of one or more comments:
512
512
513
-
```scheme
513
+
```scheme
514
514
(comment)+
515
515
```
516
516
517
517
This pattern would match a class declaration, capturing all of the decorators if any were present:
518
518
519
-
```scheme
519
+
```scheme
520
520
(class_declaration
521
521
(decorator)* @the-decorator
522
522
name: (identifier) @the-name)
523
523
```
524
524
525
525
You can also mark a node as optional using the `?` operator. For example, this pattern would match all function calls, capturing a string argument if one was present:
526
526
527
-
```scheme
527
+
```scheme
528
528
(call_expression
529
529
function: (identifier) @the-function
530
530
arguments: (arguments (string)? @the-string-arg))
@@ -534,7 +534,7 @@ You can also mark a node as optional using the `?` operator. For example, this p
534
534
535
535
You can also use parentheses for grouping a sequence of _sibling_ nodes. For example, this pattern would match a comment followed by a function declaration:
536
536
537
-
```scheme
537
+
```scheme
538
538
(
539
539
(comment)
540
540
(function_declaration)
@@ -543,7 +543,7 @@ You can also use parentheses for grouping a sequence of _sibling_ nodes. For exa
543
543
544
544
Any of the quantification operators mentioned above (`+`, `*`, and `?`) can also be applied to groups. For example, this pattern would match a comma-separated series of numbers:
545
545
546
-
```scheme
546
+
```scheme
547
547
(
548
548
(number)
549
549
("," (number))*
@@ -558,7 +558,7 @@ This is similar to _character classes_ from regular expressions (`[abc]` matches
558
558
For example, this pattern would match a call to either a variable or an object property.
559
559
In the case of a variable, capture it as `@function`, and in the case of a property, capture it as `@method`:
560
560
561
-
```scheme
561
+
```scheme
562
562
(call_expression
563
563
function: [
564
564
(identifier) @function
@@ -569,7 +569,7 @@ In the case of a variable, capture it as `@function`, and in the case of a prope
569
569
570
570
This pattern would match a set of possible keyword tokens, capturing them as `@keyword`:
571
571
572
-
```scheme
572
+
```scheme
573
573
[
574
574
"break"
575
575
"delete"
@@ -592,7 +592,7 @@ and `_` will match any named or anonymous node.
592
592
593
593
For example, this pattern would match any node inside a call:
594
594
595
-
```scheme
595
+
```scheme
596
596
(call (_) @call.inner)
597
597
```
598
598
@@ -602,21 +602,21 @@ The anchor operator, `.`, is used to constrain the ways in which child patterns
602
602
603
603
When `.` is placed before the _first_ child within a parent pattern, the child will only match when it is the first named node in the parent. For example, the below pattern matches a given `array` node at most once, assigning the `@the-element` capture to the first `identifier` node in the parent `array`:
604
604
605
-
```scheme
605
+
```scheme
606
606
(array . (identifier) @the-element)
607
607
```
608
608
609
609
Without this anchor, the pattern would match once for every identifier in the array, with `@the-element` bound to each matched identifier.
610
610
611
611
Similarly, an anchor placed after a pattern's _last_ child will cause that child pattern to only match nodes that are the last named child of their parent. The below pattern matches only nodes that are the last named child within a `block`.
612
612
613
-
```scheme
613
+
```scheme
614
614
(block (_) @last-expression .)
615
615
```
616
616
617
617
Finally, an anchor _between_ two child patterns will cause the patterns to only match nodes that are immediate siblings. The pattern below, given a long dotted name like `a.b.c.d`, will only match pairs of consecutive identifiers: `a, b`, `b, c`, and `c, d`.
618
618
619
-
```scheme
619
+
```scheme
620
620
(dotted_name
621
621
(identifier) @prev-id
622
622
.
@@ -633,7 +633,7 @@ You can also specify arbitrary metadata and conditions associated with a pattern
633
633
634
634
For example, this pattern would match identifier whose names is written in `SCREAMING_SNAKE_CASE`:
635
635
636
-
```scheme
636
+
```scheme
637
637
(
638
638
(identifier) @constant
639
639
(#match? @constant "^[A-Z][A-Z_]+")
@@ -642,7 +642,7 @@ For example, this pattern would match identifier whose names is written in `SCRE
642
642
643
643
And this pattern would match key-value pairs where the `value` is an identifier with the same name as the key:
644
644
645
-
```scheme
645
+
```scheme
646
646
(
647
647
(pair
648
648
key: (property_identifier) @key-name
@@ -723,8 +723,8 @@ The node types file contains an array of objects, each of which describes a part
723
723
724
724
Every object in this array has these two entries:
725
725
726
-
- `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
727
-
- `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
726
+
* `"type"` - A string that indicates which grammar rule the node represents. This corresponds to the `ts_node_type` function described [above](#syntax-nodes).
727
+
* `"named"` - A boolean that indicates whether this kind of node corresponds to a rule name in the grammar or just a string literal. See [above](#named-vs-anonymous-nodes) for more info.
728
728
729
729
Examples:
730
730
@@ -745,14 +745,14 @@ Together, these two fields constitute a unique identifier for a node type; no tw
745
745
746
746
Many syntax nodes can have _children_. The node type object describes the possible children that a node can have using the following entries:
747
747
748
-
-`"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below.
749
-
-`"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields.
748
+
*`"fields"` - An object that describes the possible [fields](#node-field-names) that the node can have. The keys of this object are field names, and the values are _child type_ objects, described below.
749
+
*`"children"` - Another _child type_ object that describes all of the node's possible _named_ children _without_ fields.
750
750
751
751
A _child type_ object describes a set of child nodes using the following entries:
752
752
753
-
-`"required"` - A boolean indicating whether there is always _at least one_ node in this set.
754
-
-`"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set.
755
-
-`"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
753
+
*`"required"` - A boolean indicating whether there is always _at least one_ node in this set.
754
+
*`"multiple"` - A boolean indicating whether there can be _multiple_ nodes in this set.
755
+
*`"types"`- An array of objects that represent the possible types of nodes in this set. Each object has two keys: `"type"` and `"named"`, whose meanings are described above.
756
756
757
757
Example with fields:
758
758
@@ -812,7 +812,7 @@ In Tree-sitter grammars, there are usually certain rules that represent abstract
812
812
813
813
Normally, hidden rules are not mentioned in the node types file, since they don't appear in the syntax tree. But if you add a hidden rule to the grammar's [`supertypes` list](./creating-parsers#the-grammar-dsl), then it _will_ show up in the node types file, with the following special entry:
814
814
815
-
-`"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.
815
+
*`"subtypes"` - An array of objects that specify the _types_ of nodes that this 'supertype' node can wrap.
0 commit comments