verible-verilog-syntax
checks SystemVerilog syntax, and provides some useful options for examining lexed/parsed representations. When troubleshooting Verible's SystemVerilog tools, start with this tool.
You can read about lexer and parser implementation details here.
usage: verible-verilog-syntax [options] <file(s)...> Flags from verilog/tools/syntax/verilog_syntax.cc: --error_limit (Limit the number of syntax errors reported. (0: unlimited)); default: 0; --export_json (Uses JSON for output. Intended to be used as an input for other tools.); default: false; --lang (Selects language variant to parse. Options: auto: SystemVerilog-2017, but may auto-detect alternate parsing modes sv: strict SystemVerilog-2017, with explicit alternate parsing modes lib: Verilog library map language (LRM Ch. 33) ); default: auto; --printrawtokens (Prints all lexed tokens, including filtered ones.); default: false; --printtokens (Prints all lexed and filtered tokens); default: false; --printtree (Whether or not to print the tree); default: false; --verifytree (Verifies that all tokens are parsed into tree, prints unmatched tokens); default: false;
The parser supports alternative parsing modes where a file is intended to be included in another context, such as module body items, and can be triggered with comments near the top-of-file like // verilog_syntax: parse-as-module-body
.
The following code:
// This is module foo. module foo(input a, b, output z); endmodule : foo
produces the following tokens (shown using --printrawtokens
):
All lexed tokens: (#"// end of line comment" @0-22: "// This is module foo.") (#"<<\\n>>" @22-23: " ") (#"module" @23-29: "module") (#"<<space>>" @29-30: " ") (#SymbolIdentifier @30-33: "foo") (#'(' @33-34: "(") (#"input" @34-39: "input") (#"<<space>>" @39-40: " ") (#SymbolIdentifier @40-41: "a") (#',' @41-42: ",") (#"<<space>>" @42-43: " ") (#SymbolIdentifier @43-44: "b") (#',' @44-45: ",") (#"<<space>>" @45-46: " ") (#"output" @46-52: "output") (#"<<space>>" @52-53: " ") (#SymbolIdentifier @53-54: "z") (#')' @54-55: ")") (#';' @55-56: ";") (#"<<\\n>>" @56-57: " ") (#"endmodule" @57-66: "endmodule") (#"<<space>>" @66-67: " ") (#':' @67-68: ":") (#"<<space>>" @68-69: " ") (#SymbolIdentifier @69-72: "foo") (#"<<\\n>>" @72-73: " ") (#"<<\\n>>" @73-74: " ") (#$end @74-74: "")
The token names (after #
) correspond to description strings in the yacc grammar file; keywords are shown the same as the text they match. Byte offsets are shown as the range that follows ‘@’. The raw, unfiltered token stream is lossless with respect to the original input text.
With --printtokens
, you should see whitespace tokens filtered out.
The following code (same as above):
// This is module foo. module foo(input a, b, output z); endmodule : foo
produces this concrete syntax tree (CST), rendered by verible-verilog-syntax --printtree
:
Parse Tree: Node @0 (tag: kDescriptionList) { Node @0 (tag: kModuleDeclaration) { Node @0 (tag: kModuleHeader) { Leaf @0 (#"module" @23-29: "module") Leaf @2 (#SymbolIdentifier @30-33: "foo") Node @5 (tag: kParenGroup) { Leaf @0 (#'(' @33-34: "(") Node @1 (tag: kPortDeclarationList) { Node @0 (tag: kPortDeclaration) { Leaf @0 (#"input" @34-39: "input") Node @2 (tag: kDataType) { } Node @3 (tag: kUnqualifiedId) { Leaf @0 (#SymbolIdentifier @40-41: "a") } Node @4 (tag: kUnpackedDimensions) { } } Leaf @1 (#',' @41-42: ",") Node @2 (tag: kPort) { Node @0 (tag: kPortReference) { Node @0 (tag: kUnqualifiedId) { Leaf @0 (#SymbolIdentifier @43-44: "b") } } } Leaf @1 (#',' @41-42: ",") Node @2 (tag: kPort) { Node @0 (tag: kPortReference) { Node @0 (tag: kUnqualifiedId) { Leaf @0 (#SymbolIdentifier @43-44: "b") } } } Leaf @3 (#',' @44-45: ",") Node @4 (tag: kPortDeclaration) { Leaf @0 (#"output" @46-52: "output") Node @2 (tag: kDataType) { } Node @3 (tag: kUnqualifiedId) { Leaf @0 (#SymbolIdentifier @53-54: "z") } Node @4 (tag: kUnpackedDimensions) { } } } Leaf @2 (#')' @54-55: ")") } Leaf @7 (#';' @55-56: ";") } Node @1 (tag: kModuleItemList) { } Leaf @2 (#"endmodule" @57-66: "endmodule") Node @3 (tag: kLabel) { Leaf @0 (#':' @67-68: ":") Leaf @1 (#SymbolIdentifier @69-72: "foo") } } }
The N
in Node @N
or Leaf @N
refers to the child rank of that node/leaf with respect to its immediate parent node, starting at 0. nullptr
nodes are skipped and will look like gaps in the rank sequence.
Nodes of the CST may link to other nodes or leaves (which contain tokens). The nodes are tagged with language-specific enumerations. Each leaf encapsulates a token and is shown with its corresponding byte-offsets in the original text (as @left-right
). Null nodes are not shown.
When --export_json
flag is set, concrete syntax tree is printed as JSON object. See Parser tree object below for details.
The exact structure of the SystemVerilog CST is fragile, and should not be considered stable; at any time, node enumerations can be created or removed, and subtree structures can be re-shaped. In the above example, kModuleHeader
is an implementation detail of a module definition‘s composition, and doesn’t map directly to a named grammar construct in the SV-LRM. The verilog/CST
library provides functions that abstract away internal structure.
JSON root is an object which maps each input file name to an object containing parsing result for that file.
Key | Type | Description |
---|---|---|
tokens | array | List of Token objects, with whitespace tokens filtered out. Present only when --printtokens flag is specified. |
rawtokens | array | List of Token objects. Present only when --printrawtokens flag is specified. |
tree | object | Parser tree. Present only when --printtree flag is specified and parsing errors didn't prevent tree creation. |
errors | array | List of Error objects. Present only when there were any errors. |
The tree consist of Node and Token objects. The tree root is a Node object.
Key | Type | Description |
---|---|---|
tag | string | Node tag. See NodeEnum in verilog_nonterminals.h for available values. |
children | array | List of children (Node and Token, or null ). |
Key | Type | Description |
---|---|---|
start , end | int | Byte offset of token's first character and a character just past the symbol in source text. |
tag | string | Token tag. See Possible token tag values below for details. |
text (optional) | string | Token text. Not present in operator and keyword token objects. |
To get token text, either use text
value (if present), or read source file from byte start
(included) to byte end
(excluded). Example in Python:
start = token["start"] end = token["end"] # Read source file contents as bytes with open(source_file_path, "rb") as f: source = f.read() # Get token text from source file contents text = source[start:end].decode("utf-8")
tag
valuesToken tag enumerations come from the parser generator, with a few overrides specified in verilog_token.cc
. There are 3 types of values:
SymbolIdentifier
, TK_DecNumber
), which come from %token TOKEN_TAG
lines.module
, ==
), which come from %token SOME_ID "token_tag"
lines.;
, =
). They can be found using '.'
regular expression.Key | Type | Description |
---|---|---|
line , column | int | Line and column in source text. 0-based. |
text | string | Character sequence which caused the error. |
phase | string | Phase during which the error occured. One of: lex , parse , preprocess , unknown . |
message | string | (optional) Error explanation. |
export_json_examples
directory contains Python wrappers for verible-verilog-syntax --export_json
(verible_verilog_syntax.py
file) and some examples.