Skip to content

Commit 3bd6fae

Browse files
author
Patrick Thomson
authored
Merge pull request tree-sitter#1649 from tree-sitter/tag-name-conventions
Describe tagging and associated naming conventions for syntax captures.
2 parents af00782 + 764c8c8 commit 3bd6fae

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
title: Code Navigation Systems
3+
permalink: code-navigation-systems
4+
---
5+
6+
# Code Navigation Systems
7+
8+
Tree-sitter can be used in conjunction with its [tree query language](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) as a part of code navigation systems. An example of such a system can be seen in the `tree-sitter tags` command, which emits a textual dump of the interesting syntactic nodes in its file argument. A notable application of this is GitHub's support for [search-based code navigation](https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation). This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar.
9+
10+
## Tagging and captures
11+
12+
*Tagging* is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name.
13+
14+
The essence of a given tag lies in two pieces of data: the _role_ of the entity that is matched (i.e. whether it is a definition or a reference) and the _kind_ of that entity, which describes how the entity is used (i.e. whether it's a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the `@role.kind` capture name format, and another inner capture, always called `@name`, that pulls out the name of a given identifier.
15+
16+
You may optionally include a capture named `@doc` to bind a docstring. For convenience purposes, the tagging system provides two built-in functions, `#select-adjacent!` and `#strip!` that are convenient for removing comment syntax from a docstring. `#strip!` takes a capture as its first argument and a regular expression as its second, expressed as a quoted string. Any text patterns matched by the regular expression will be removed from the text associated with the passed capture. `#select-adjacent!`, when passed two capture names, filters the text associated with the first capture so that only nodes adjacent to the second capture are preserved. This can be useful when writing queries that would otherwise include too much information in matched comments.
17+
18+
## Examples
19+
20+
This [query](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/queries/tags.scm#L4-L5) recognizes Python function definitions and captures their declared name. The `function_definition` syntax node is defined in the [Python Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-python/blob/78c4e9b6b2f08e1be23b541ffced47b15e2972ad/grammar.js#L354).
21+
22+
``` scheme
23+
(function_definition
24+
name: (identifier) @name) @definition.function
25+
```
26+
27+
A more sophisticated query can be found in the [JavaScript Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-javascript/blob/fdeb68ac8d2bd5a78b943528bb68ceda3aade2eb/queries/tags.scm#L63-L70):
28+
29+
``` scheme
30+
(assignment_expression
31+
left: [
32+
(identifier) @name
33+
(member_expression
34+
property: (property_identifier) @name)
35+
]
36+
right: [(arrow_function) (function)]
37+
) @definition.function
38+
```
39+
40+
An even more sophisticated query is in the [Ruby Tree-sitter repository](https://github.com/tree-sitter/tree-sitter-ruby/blob/1ebfdb288842dae5a9233e2509a135949023dd82/queries/tags.scm#L24-L43), which uses built-in functions to strip the Ruby comment character (`#`) from the docstrings associated with a class or singleton-class declaration, then selects only the docstrings adjacent to the node matched as `@definition.class`.
41+
42+
``` scheme
43+
(
44+
(comment)* @doc
45+
.
46+
[
47+
(class
48+
name: [
49+
(constant) @name
50+
(scope_resolution
51+
name: (_) @name)
52+
]) @definition.class
53+
(singleton_class
54+
value: [
55+
(constant) @name
56+
(scope_resolution
57+
name: (_) @name)
58+
]) @definition.class
59+
]
60+
(#strip! @doc "^#\\s*")
61+
(#select-adjacent! @doc @definition.class)
62+
)
63+
```
64+
65+
The below table describes a standard vocabulary for kinds and roles during the tagging process. New applications may extend (or only recognize a subset of) these capture names, but it is desirable to standardize on the names below.
66+
67+
| Category | Tag |
68+
|--------------------------|-----------------------------|
69+
| Class definitions | `@definition.class` |
70+
| Function definitions | `@definition.function` |
71+
| Interface definitions | `@definition.interface` |
72+
| Method definitions | `@definition.method` |
73+
| Module definitions | `@definition.module` |
74+
| Function/method calls | `@reference.call` |
75+
| Class reference | `@reference.class` |
76+
| Interface implementation | `@reference.implementation` |
77+
78+
## Command-line invocation
79+
80+
You can use the `tree-sitter tags` command to test out a tags query file, passing as arguments one or more files to tag. We can run this tool from within the Tree-sitter Ruby repository, over code in a file called `test.rb`:
81+
82+
``` ruby
83+
module Foo
84+
class Bar
85+
# won't be included
86+
87+
# is adjacent, will be
88+
def baz
89+
end
90+
end
91+
end
92+
```
93+
94+
Invoking `tree-sitter tags test.rb` produces the following console output, representing matched entities' name, role, location, first line, and docstring:
95+
96+
```
97+
test.rb
98+
Foo | module def (0, 7) - (0, 10) `module Foo`
99+
Bar | class def (1, 8) - (1, 11) `class Bar`
100+
baz | method def (2, 8) - (2, 11) `def baz` "is adjacent, will be"
101+
```
102+
103+
It is expected that tag queries for a given language are located at `queries/tags.scm` in that language's repository.

0 commit comments

Comments
 (0)