blob: 3ed5baf431052c31c93fe7eaf7ea99878082e59f [file] [log] [blame] [view] [edit]
# SystemVerilog Concrete Syntax Tree
<!--*
freshness: { owner: 'hzeller' reviewed: '2020-10-04' }
*-->
The SystemVerilog concrete syntax tree (CST) uses the language-agnostic syntax
tree structure with its own set of (`int`) enumerations for tree nodes and
leaves. The CST includes all syntactically relevant tokens, no comments, no
attributes (at this time), and limited support for preprocessing constructs.
The exact [node enumerations](verilog_nonterminals.h) should be considered
fragile until stated otherwise; they may change, get new enumerations, or remove
obsolete ones. Code that depends direct use of these enumerations should be
well-tested so that breakages are easy to diagnose and fix.
CST leaves contain tokens, which bear the token enumerations generated from the
[parser implementation](../parser/verilog.y). These token enumerations are
relatively stable. However, where practical, we encourage use of
[token classification functions](../parser/verilog_token_classifications.h).
The node enumerations are used directly in the semantic actions of the
[SystemVerilog parser](../parser/verilog.y), with functions like
`MakeTaggedNode`.
Both node and token enumerations are used in syntax tree analyzers and also
drive formatting decisions in the formatter.
## Ideals
_Ideal_ properties of CST nodes:
* Every construction of a CST node follows a **consistent** substructure, as
if there were a class with one constructor for each node type. Consistency
allows one to write simple functions that directly access substructure by
descending through CST nodes positionally. For every node enumeration
`kFoo`, there should be `MakeFoo` function that constructs a CST node from
its arguments. Accessor functions should be short and composable.
* Construction also provides an opportunity to check that the programmer
has not made a mistake, by asserting invariant properties about the
arguments.
* Access into a CST node's substructure is **exclusively** done by
`GetFooFromBar`-style functions that hide the structural details of a node,
while remaining consistent with construction.
* Both implementation of the constructor and accessor functions should ideally
come from a _single-source-of-truth_, that is, they should be **generated**
(no later than compile-time) from one specification for each node type,
rather than maintained independently.
This is not the case today because of the haste in which initial development
took place, but help is wanted towards achieving the aforementioned ideals. See
also https://github.com/chipsalliance/verible/issues/159.
## Abstract Syntax Tree?
_Wouldn't an abstract syntax tree (AST) satisfy the above ideals?_ Yes, this
would take time to write, and we would need
[help](https://github.com/chipsalliance/verible/issues/184).
An AST may not be a great representation for _unpreprocessed_ code, which is the
focus of the first developer tool applications. Having a
[standard-compliant SV preprocessor](https://github.com/chipsalliance/verible/issues/183)
would pave the way to making an AST more useful.
## Testing
Most CST accessor function tests should follow this outline:
* Declare an array of test data in the form of
[SyntaxTreeSearchTestCase](https://cs.opensource.google/verible/verible/+/master:common/analysis/syntax_tree_search_test_utils.h)
* Each element compactly represents the code to analyze, and the set of
expected findings as annotated subranges of text.
* For every function-under-test, establish a function that extracts the
targeted subranges of text (which must be non-overlapping). This could be a
simple find-function on a syntax tree or contain any sequence of search
refinements.
* Pass these into the
[TestVerilogSyntaxRangeMatches](https://cs.opensource.google/verible/verible/+/master:verilog/CST/match_test_utils.h)
test driver function which compare actual vs. expected subranges.