blob: 28c80962508ffba4ee44b1602588c87dfb1c3521 [file] [log] [blame] [view]
# SystemVerilog Syntax Tool
<!--*
freshness: { owner: 'hzeller' reviewed: '2020-10-15' }
*-->
`verible-verilog-syntax` checks SystemVerilog syntax, and provides some useful
options for examining lexed/parsed representations. When troubleshooting
Verible's SystemVerilog tools, start with this tool.
You can read about [lexer and parser implementation details here](../../parser).
## Usage
```
usage: verible-verilog-syntax [options] <file(s)...>
Flags from verilog/tools/syntax/verilog_syntax.cc:
--error_limit (Limit the number of syntax errors reported. (0: unlimited));
default: 0;
--export_json (Uses JSON for output. Intended to be used as an input for
other tools.); default: false;
--lang (Selects language variant to parse. Options:
auto: SystemVerilog-2017, but may auto-detect alternate parsing modes
sv: strict SystemVerilog-2017, with explicit alternate parsing modes
lib: Verilog library map language (LRM Ch. 33)
); default: auto;
--printrawtokens (Prints all lexed tokens, including filtered ones.);
default: false;
--printtokens (Prints all lexed and filtered tokens); default: false;
--printtree (Whether or not to print the tree); default: false;
--verifytree (Verifies that all tokens are parsed into tree, prints
unmatched tokens); default: false;
```
## Features
The parser supports
[alternative parsing modes](../../analysis#alternative-parsing-modes) where a
file is intended to be included in another context, such as module body items,
and can be triggered with comments near the top-of-file like `// verilog_syntax:
parse-as-module-body`.
## Token Stream Example
The following code:
```systemverilog
// This is module foo.
module foo(input a, b, output z);
endmodule : foo
```
produces the following tokens (shown using `--printrawtokens`):
```
All lexed tokens:
(#"// end of line comment" @0-22: "// This is module foo.")
(#"<<\\n>>" @22-23: "
")
(#"module" @23-29: "module")
(#"<<space>>" @29-30: " ")
(#SymbolIdentifier @30-33: "foo")
(#'(' @33-34: "(")
(#"input" @34-39: "input")
(#"<<space>>" @39-40: " ")
(#SymbolIdentifier @40-41: "a")
(#',' @41-42: ",")
(#"<<space>>" @42-43: " ")
(#SymbolIdentifier @43-44: "b")
(#',' @44-45: ",")
(#"<<space>>" @45-46: " ")
(#"output" @46-52: "output")
(#"<<space>>" @52-53: " ")
(#SymbolIdentifier @53-54: "z")
(#')' @54-55: ")")
(#';' @55-56: ";")
(#"<<\\n>>" @56-57: "
")
(#"endmodule" @57-66: "endmodule")
(#"<<space>>" @66-67: " ")
(#':' @67-68: ":")
(#"<<space>>" @68-69: " ")
(#SymbolIdentifier @69-72: "foo")
(#"<<\\n>>" @72-73: "
")
(#"<<\\n>>" @73-74: "
")
(#$end @74-74: "")
```
The token names (after `#`) correspond to description strings in the yacc
grammar file; keywords are shown the same as the text they match. Byte offsets
are shown as the range that follows '@'. The raw, unfiltered token stream is
lossless with respect to the original input text.
With `--printtokens`, you should see whitespace tokens filtered out.
## Concrete Syntax Tree Example
The following code (same as above):
```systemverilog
// This is module foo.
module foo(input a, b, output z);
endmodule : foo
```
produces this [concrete syntax tree (CST)](../../CST), rendered by
`verible-verilog-syntax --printtree`:
```
Parse Tree:
Node @0 (tag: kDescriptionList) {
Node @0 (tag: kModuleDeclaration) {
Node @0 (tag: kModuleHeader) {
Leaf @0 (#"module" @23-29: "module")
Leaf @2 (#SymbolIdentifier @30-33: "foo")
Node @5 (tag: kParenGroup) {
Leaf @0 (#'(' @33-34: "(")
Node @1 (tag: kPortDeclarationList) {
Node @0 (tag: kPortDeclaration) {
Leaf @0 (#"input" @34-39: "input")
Node @2 (tag: kDataType) {
}
Node @3 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @40-41: "a")
}
Node @4 (tag: kUnpackedDimensions) {
}
}
Leaf @1 (#',' @41-42: ",")
Node @2 (tag: kPort) {
Node @0 (tag: kPortReference) {
Node @0 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @43-44: "b")
}
}
}
Leaf @1 (#',' @41-42: ",")
Node @2 (tag: kPort) {
Node @0 (tag: kPortReference) {
Node @0 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @43-44: "b")
}
}
}
Leaf @3 (#',' @44-45: ",")
Node @4 (tag: kPortDeclaration) {
Leaf @0 (#"output" @46-52: "output")
Node @2 (tag: kDataType) {
}
Node @3 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @53-54: "z")
}
Node @4 (tag: kUnpackedDimensions) {
}
}
}
Leaf @2 (#')' @54-55: ")")
}
Leaf @7 (#';' @55-56: ";")
}
Node @1 (tag: kModuleItemList) {
}
Leaf @2 (#"endmodule" @57-66: "endmodule")
Node @3 (tag: kLabel) {
Leaf @0 (#':' @67-68: ":")
Leaf @1 (#SymbolIdentifier @69-72: "foo")
}
}
}
```
The `N` in `Node @N` or `Leaf @N` refers to the child rank of that node/leaf
with respect to its immediate parent node, starting at 0. `nullptr` nodes are
skipped and will look like gaps in the rank sequence.
Nodes of the CST may link to other nodes or leaves (which contain tokens). The
nodes are tagged with
[language-specific enumerations](../../CST/verilog_nonterminals.h). Each leaf
encapsulates a token and is shown with its corresponding byte-offsets in the
original text (as `@left-right`). Null nodes are not shown.
When `--export_json` flag is set, concrete syntax tree is printed as JSON
object. See [Parser tree object](#Parser-tree-object) below for details.
The exact structure of the SystemVerilog CST is fragile, and should not be
considered stable; at any time, node enumerations can be created or removed, and
subtree structures can be re-shaped. In the above example, `kModuleHeader` is an
implementation detail of a module definition's composition, and doesn't map
directly to a named grammar construct in the [SV-LRM]. The
[`verilog/CST`](../../CST) library provides functions that abstract away
internal structure.
## JSON output description
JSON root is an object which maps each input file name to an object containing
parsing result for that file.
### Parsing result object
| Key | Type | Description |
|-------------|--------|----------------------------------------------------------|
| `tokens` | array | List of [Token](#Token-object) objects, with whitespace tokens filtered out. Present only when `--printtokens` flag is specified. |
| `rawtokens` | array | List of [Token](#Token-object) objects. Present only when `--printrawtokens` flag is specified. |
| `tree` | object | [Parser tree](#Parser-tree). Present only when `--printtree` flag is specified and parsing errors didn't prevent tree creation. |
| `errors` | array | List of [Error](#Error-object) objects. Present only when there were any errors. |
#### Parser tree
The tree consist of [Node](#Node-object) and [Token](#Token-object) objects. The tree root is a Node object.
#### Node object
| Key | Type | Description |
|------------|--------|--------------------------------------------------------|
| `tag` | string | Node tag. See `NodeEnum` in [verilog\_nonterminals.h](../../CST/verilog_nonterminals.h) for available values. |
| `children` | array | List of children ([Node](#Node-object) and [Token](#Token-object), or `null`). |
#### Token object
| Key | Type | Description |
|-------------------|--------|----------------------------------------------------|
| `start`, `end` | int | Byte offset of token's first character and a character just past the symbol in source text. |
| `tag` | string | Token tag. See [Possible token tag values](#possible-token-tag-values) below for details. |
| `text` (optional) | string | Token text. Not present in operator and keyword token objects. |
To get token text, either use `text` value (if present), or read source file from byte `start` (included) to byte `end` (excluded). Example in Python:
```python
start = token["start"]
end = token["end"]
# Read source file contents as bytes
with open(source_file_path, "rb") as f:
source = f.read()
# Get token text from source file contents
text = source[start:end].decode("utf-8")
```
##### Possible token `tag` values
Token tag enumerations come from the [parser generator](../../parser/verilog.y), with a few overrides specified in [`verilog_token.cc`](../../parser/verilog_token.cc). There are 3 types of values:
* Named tokens (e.g. `SymbolIdentifier`, `TK_DecNumber`), which come from `%token TOKEN_TAG` lines.
* String literals (e.g. `module`, `==`), which come from `%token SOME_ID "token_tag"` lines.
* Single characters (e.g. `;`, `=`). They can be found using `'.'` regular expression.
#### Error object
| Key | Type | Description |
|------------------|--------|--------------------------------------------------|
| `line`, `column` | int | Line and column in source text. 0-based. |
| `text` | string | Character sequence which caused the error. |
| `phase` | string | Phase during which the error occured. One of: `lex`, `parse`, `preprocess`, `unknown`. |
| `message` | string | (optional) Error explanation. |
### Python examples and helper code
[`export_json_examples`](./export_json_examples) directory contains Python wrappers for `verible-verilog-syntax --export_json` ([`verible_verilog_syntax.py`](./export_json_examples/verible_verilog_syntax.py) file) and some examples.
<!-- reference links -->
[SV-LRM]: https://ieeexplore.ieee.org/document/8299595