40 releases
Uses new Rust 2024
| 0.14.0 | Oct 22, 2025 |
|---|---|
| 0.13.10 | Feb 28, 2025 |
| 0.13.8 | Nov 7, 2024 |
| 0.13.7 | Jun 14, 2024 |
| 0.1.1 | Dec 18, 2018 |
#258 in Parser tooling
53,352 downloads per month
Used in 14 crates
(11 directly)
295KB
7K
SLoC
cfgrammar
cfgrammar reads in grammar files, processes them, and provides a convenient
API for operating with them. It may be of interest to those manipulating
grammars directly, or who wish to use custom types of parsers.
lib.rs:
A library for manipulating Context Free Grammars (CFG). It is impractical to fully homogenise all the types of grammars out there, so the aim is for different grammar types to have completely separate implementations. Code that wants to be generic over more than one grammar type can then use an "adapter" to homogenise the particular grammar types of interest. Currently this is a little academic, since only Yacc-style grammars are supported (albeit several variants of Yacc grammars).
Unfortunately, CFG terminology is something of a mess. Some people use different terms for the same concept interchangeably; some use different terms to convey subtle differences of meaning (but without complete uniformity). "Token", "terminal", and "lexeme" are examples of this: they are synonyms in some tools and papers, but not in others.
In order to make this library somewhat coherent, we therefore use some basic terminology guidelines for major concepts (acknowledging that this will cause clashes with some grammar types).
- A grammar is an ordered sequence of productions.
- A production is an ordered sequence of symbols.
- A rule maps a name to one or more productions.
- A token is the name of a syntactic element.
For example, in the following Yacc grammar:
R1: "a" "b" | R2; R2: "c";
the following statements are true:
- There are 3 productions. 1: ["a", "b"] 2: ["R2"] 3: ["c"]`
- There are two rules: R1 and R2. The mapping to productions is {R1: {1, 2}, R2: {3}}
- There are three tokens: a, b, and c.
cfgrammar makes the following guarantees about grammars:
- Productions are numbered from
0toprods_len() - 1(inclusive). - Rules are numbered from
0torules_len() - 1(inclusive). - Tokens are numbered from
0totoks_len() - 1(inclusive). - The StorageT type used to store productions, rules, and token indices can be infallibly
converted into usize (see
TIdxand friends for more details).
For most current uses, the main function to investigate is
YaccGrammar::new() and/or
YaccGrammar::new_with_storaget()
which take as input a Yacc grammar.
Dependencies
~2.9–4.5MB
~75K SLoC