Skip to content

Commit 1878b42

Browse files
committed
Add documentation about included ranges
1 parent 78b5481 commit 1878b42

File tree

1 file changed

+105
-1
lines changed

1 file changed

+105
-1
lines changed

docs/section-2-using-parsers.md

Lines changed: 105 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Here's an example of a simple C program that uses the Tree-sitter [JSON parser](
2727
#include <assert.h>
2828
#include <string.h>
2929
#include <stdio.h>
30-
#include "tree_sitter/runtime.h"
30+
#include <tree_sitter/runtime.h>
3131

3232
// Declare the `tree_sitter_json` function, which is
3333
// implemented by the `tree-sitter-json` library.
@@ -236,6 +236,110 @@ void ts_node_edit(TSNode *, const TSInputEdit *);
236236
237237
Then, you can call `ts_parser_parse` again, passing in the old tree. This will create a new tree that internally shares structure with the old tree.
238238
239+
## Multi-language Documents
240+
241+
Sometimes, different parts of a file may be written in different languages. For example, templating languages like [EJS](http://ejs.co) and [ERB](https://ruby-doc.org/stdlib-2.5.1/libdoc/erb/rdoc/ERB.html) allow you to generate HTML by writing a mixture of HTML and another language like JavaScript or Ruby.
242+
243+
Tree-sitter handles these types of documents by allowing you to create a syntax tree based on the text in certain *ranges* of a file.
244+
245+
```c
246+
typedef struct {
247+
TSPoint start_point;
248+
TSPoint end_point;
249+
uint32_t start_byte;
250+
uint32_t end_byte;
251+
} TSRange;
252+
253+
void ts_parser_set_included_ranges(
254+
TSParser *self,
255+
const TSRange *ranges,
256+
uint32_t range_count
257+
);
258+
```
259+
260+
For example, consider this ERB document:
261+
262+
```erb
263+
<ul>
264+
<% people.each do |person| %>
265+
<li><%= person.name %></li>
266+
<% end %>
267+
</ul>
268+
```
269+
270+
Conceptually, it can be represented by three syntax trees with overlapping ranges: an ERB syntax tree, a Ruby syntax tree, and an HTML syntax tree. You could generate these syntax trees as follows:
271+
272+
```c
273+
#include <string.h>
274+
#include <tree_sitter/runtime.h>
275+
276+
// These functions are each implemented in their own repo.
277+
const TSLanguage *tree_sitter_embedded_template();
278+
const TSLanguage *tree_sitter_html();
279+
const TSLanguage *tree_sitter_ruby();
280+
281+
int main(int argc, const char **argv) {
282+
const char *text = argv[1];
283+
unsigned len = strlen(src);
284+
285+
// Parse the entire text as ERB.
286+
TSParser *parser = ts_parser_new();
287+
ts_parser_set_language(parser, tree_sitter_embedded_template());
288+
TSTree *erb_tree = ts_parser_parse_string(parser, NULL, text, len);
289+
TSNode erb_root_node = ts_tree_root_node(erb_tree);
290+
291+
// Find the ranges of the `content` nodes, which represent
292+
// the underlying HTML, and the `code` nodes, which represent
293+
// the interpolated Ruby.
294+
TSRange html_ranges[10];
295+
TSRange ruby_ranges[10];
296+
unsigned html_range_count = 0;
297+
unsigned ruby_range_count = 0;
298+
unsigned child_count = ts_node_child_count(erb_root_node);
299+
300+
for (unsigned i = 0; i < child_count; i++) {
301+
TSNode node = ts_node_child(erb_root_node, i);
302+
if (strcmp(ts_node_type(node), "content") == 0) {
303+
html_ranges[html_range_count++] = (TSRange) {
304+
ts_node_start_point(node),
305+
ts_node_end_point(node),
306+
ts_node_start_byte(node),
307+
ts_node_end_byte(node),
308+
};
309+
} else {
310+
TSNode code_node = ts_node_named_child(node, 0);
311+
ruby_ranges[ruby_range_count++] = (TSRange) {
312+
ts_node_start_point(code_node),
313+
ts_node_end_point(code_node),
314+
ts_node_start_byte(code_node),
315+
ts_node_end_byte(code_node),
316+
};
317+
}
318+
}
319+
320+
// Use the HTML ranges to parse the HTML.
321+
ts_parser_set_language(parser, tree_sitter_html());
322+
ts_parser_set_included_ranges(parser, html_ranges, html_range_count);
323+
TSTree *html_tree = ts_parser_parse_string(parser, NULL, text, len);
324+
TSNode html_root_node = ts_tree_root_node(html_tree);
325+
326+
// Use the Ruby ranges to parse the Ruby.
327+
ts_parser_set_language(parser, tree_sitter_ruby());
328+
ts_parser_set_included_ranges(parser, ruby_ranges, ruby_range_count);
329+
TSTree *ruby_tree = ts_parser_parse_string(parser, NULL, text, len);
330+
TSNode ruby_root_node = ts_tree_root_node(ruby_tree);
331+
332+
// Print all three trees.
333+
char *erb_sexp = ts_node_string(erb_root_node);
334+
char *html_sexp = ts_node_string(html_root_node);
335+
char *ruby_sexp = ts_node_string(ruby_root_node);
336+
printf("ERB: %s\n", erb_sexp);
337+
printf("HTML: %s\n", html_sexp);
338+
printf("Ruby: %s\n", ruby_sexp);
339+
return 0;
340+
}
341+
```
342+
239343
## Concurrency
240344
241345
Tree-sitter supports multi-threaded use cases by making syntax trees very cheap to copy.

0 commit comments

Comments
 (0)