| |
| \chapter{Optimizations} |
| \label{chapter:opt} |
| |
| Yosys employs a number of optimizations to generate better and cleaner results. |
| This chapter outlines these optimizations. |
| |
| \section{Simple Optimizations} |
| |
| The Yosys pass {\tt opt} runs a number of simple optimizations. This includes removing unused |
| signals and cells and const folding. It is recommended to run this pass after each major step |
| in the synthesis script. At the time of this writing the {\tt opt} pass executes the following |
| passes that each perform a simple optimization: |
| |
| \begin{itemize} |
| \item Once at the beginning of {\tt opt}: |
| \begin{itemize} |
| \item {\tt opt\_expr} |
| \item {\tt opt\_merge -nomux} |
| \end{itemize} |
| \item Repeat until result is stable: |
| \begin{itemize} |
| \item {\tt opt\_muxtree} |
| \item {\tt opt\_reduce} |
| \item {\tt opt\_merge} |
| \item {\tt opt\_rmdff} |
| \item {\tt opt\_clean} |
| \item {\tt opt\_expr} |
| \end{itemize} |
| \end{itemize} |
| |
| The following section describes each of the {\tt opt\_*} passes. |
| |
| \subsection{The opt\_expr pass} |
| |
| This pass performs const folding on the internal combinational cell types |
| described in Chap.~\ref{chapter:celllib}. This means a cell with all constant |
| inputs is replaced with the constant value this cell drives. In some cases |
| this pass can also optimize cells with some constant inputs. |
| |
| \begin{table} |
| \hfil |
| \begin{tabular}{cc|c} |
| A-Input & B-Input & Replacement \\ |
| \hline |
| any & 0 & 0 \\ |
| 0 & any & 0 \\ |
| 1 & 1 & 1 \\ |
| \hline |
| X/Z & X/Z & X \\ |
| 1 & X/Z & X \\ |
| X/Z & 1 & X \\ |
| \hline |
| any & X/Z & 0 \\ |
| X/Z & any & 0 \\ |
| \hline |
| $a$ & 1 & $a$ \\ |
| 1 & $b$ & $b$ \\ |
| \end{tabular} |
| \caption{Const folding rules for {\tt\$\_AND\_} cells as used in {\tt opt\_expr}.} |
| \label{tab:opt_expr_and} |
| \end{table} |
| |
| Table~\ref{tab:opt_expr_and} shows the replacement rules used for optimizing |
| an {\tt\$\_AND\_} gate. The first three rules implement the obvious const folding |
| rules. Note that `any' might include dynamic values calculated by other parts |
| of the circuit. The following three lines propagate undef (X) states. |
| These are the only three cases in which it is allowed to propagate an undef |
| according to Sec.~5.1.10 of IEEE Std. 1364-2005 \cite{Verilog2005}. |
| |
| The next two lines assume the value 0 for undef states. These two rules are only |
| used if no other substitutions are possible in the current module. If other substitutions |
| are possible they are performed first, in the hope that the `any' will change to |
| an undef value or a 1 and therefore the output can be set to undef. |
| |
| The last two lines simply replace an {\tt\$\_AND\_} gate with one constant-1 |
| input with a buffer. |
| |
| Besides this basic const folding the {\tt opt\_expr} pass can replace 1-bit wide |
| {\tt \$eq} and {\tt \$ne} cells with buffers or not-gates if one input is constant. |
| |
| The {\tt opt\_expr} pass is very conservative regarding optimizing {\tt \$mux} cells, |
| as these cells are often used to model decision-trees and breaking these trees can |
| interfere with other optimizations. |
| |
| \subsection{The opt\_muxtree pass} |
| |
| This pass optimizes trees of multiplexer cells by analyzing the select inputs. |
| Consider the following simple example: |
| |
| \begin{lstlisting}[numbers=left,frame=single,language=Verilog] |
| module uut(a, y); |
| input a; |
| output [1:0] y = a ? (a ? 1 : 2) : 3; |
| endmodule |
| \end{lstlisting} |
| |
| The output can never be 2, as this would require \lstinline[language=Verilog];a; |
| to be 1 for the outer multiplexer and 0 for the inner multiplexer. The {\tt |
| opt\_muxtree} pass detects this contradiction and replaces the inner multiplexer |
| with a constant 1, yielding the logic for \lstinline[language=Verilog];y = a ? 1 : 3;. |
| |
| \subsection{The opt\_reduce pass} |
| |
| \begin{sloppypar} |
| This is a simple optimization pass that identifies and consolidates identical input |
| bits to {\tt \$reduce\_and} and {\tt \$reduce\_or} cells. It also sorts the input |
| bits to ease identification of shareable {\tt \$reduce\_and} and {\tt \$reduce\_or} cells |
| in other passes. |
| \end{sloppypar} |
| |
| This pass also identifies and consolidates identical inputs to multiplexer cells. In this |
| case the new shared select bit is driven using a {\tt \$reduce\_or} cell that combines |
| the original select bits. |
| |
| Lastly this pass consolidates trees of {\tt \$reduce\_and} cells and trees of |
| {\tt \$reduce\_or} cells to single large {\tt \$reduce\_and} or {\tt \$reduce\_or} cells. |
| |
| These three simple optimizations are performed in a loop until a stable result is |
| produced. |
| |
| \subsection{The opt\_rmdff pass} |
| |
| This pass identifies single-bit d-type flip-flops ({\tt \$\_DFF\_*}, {\tt \$dff}, and {\tt |
| \$adff} cells) with a constant data input and replaces them with a constant driver. |
| |
| \subsection{The opt\_clean pass} |
| |
| This pass identifies unused signals and cells and removes them from the design. It also |
| creates an \B{unused\_bits} attribute on wires with unused bits. This attribute can be |
| used for debugging or by other optimization passes. |
| |
| \subsection{The opt\_merge pass} |
| |
| This pass performs trivial resource sharing. This means that this pass identifies cells |
| with identical inputs and replaces them with a single instance of the cell. |
| |
| The option {\tt -nomux} can be used to disable resource sharing for multiplexer |
| cells ({\tt \$mux} and {\tt \$pmux}. This can be useful as |
| it prevents multiplexer trees to be merged, which might prevent {\tt opt\_muxtree} |
| to identify possible optimizations. |
| |
| \section{FSM Extraction and Encoding} |
| |
| The {\tt fsm} pass performs finite-state-machine (FSM) extraction and recoding. The {\tt fsm} |
| pass simply executes the following other passes: |
| |
| \begin{itemize} |
| \item Identify and extract FSMs: |
| \begin{itemize} |
| \item {\tt fsm\_detect} |
| \item {\tt fsm\_extract} |
| \end{itemize} |
| |
| \item Basic optimizations: |
| \begin{itemize} |
| \item {\tt fsm\_opt} |
| \item {\tt opt\_clean} |
| \item {\tt fsm\_opt} |
| \end{itemize} |
| |
| \item Expanding to nearby gate-logic (if called with {\tt -expand}): |
| \begin{itemize} |
| \item {\tt fsm\_expand} |
| \item {\tt opt\_clean} |
| \item {\tt fsm\_opt} |
| \end{itemize} |
| |
| \item Re-code FSM states (unless called with {\tt -norecode}): |
| \begin{itemize} |
| \item {\tt fsm\_recode} |
| \end{itemize} |
| |
| \item Print information about FSMs: |
| \begin{itemize} |
| \item {\tt fsm\_info} |
| \end{itemize} |
| |
| \item Export FSMs in KISS2 file format (if called with {\tt -export}): |
| \begin{itemize} |
| \item {\tt fsm\_export} |
| \end{itemize} |
| |
| \item Map FSMs to RTL cells (unless called with {\tt -nomap}): |
| \begin{itemize} |
| \item {\tt fsm\_map} |
| \end{itemize} |
| \end{itemize} |
| |
| The {\tt fsm\_detect} pass identifies FSM state registers and marks them using the |
| \B{fsm\_encoding}{\tt = "auto"} attribute. The {\tt fsm\_extract} extracts all |
| FSMs marked using the \B{fsm\_encoding} attribute (unless \B{fsm\_encoding} is |
| set to {\tt "none"}) and replaces the corresponding RTL cells with a {\tt \$fsm} |
| cell. All other {\tt fsm\_*} passes operate on these {\tt \$fsm} cells. The |
| {\tt fsm\_map} call finally replaces the {\tt \$fsm} cells with RTL cells. |
| |
| Note that these optimizations operate on an RTL netlist. I.e.~the {\tt fsm} pass |
| should be executed after the {\tt proc} pass has transformed all |
| {\tt RTLIL::Process} objects to RTL cells. |
| |
| The algorithms used for FSM detection and extraction are influenced by a more |
| general reported technique \cite{fsmextract}. |
| |
| \subsection{FSM Detection} |
| |
| The {\tt fsm\_detect} pass identifies FSM state registers. It sets the |
| \B{fsm\_encoding}{\tt = "auto"} attribute on any (multi-bit) wire that matches |
| the following description: |
| |
| \begin{itemize} |
| \item Does not already have the \B{fsm\_encoding} attribute. |
| \item Is not an output of the containing module. |
| \item Is driven by single {\tt \$dff} or {\tt \$adff} cell. |
| \item The \B{D}-Input of this {\tt \$dff} or {\tt \$adff} cell is driven by a multiplexer |
| tree that only has constants or the old state value on its leaves. |
| \item The state value is only used in the said multiplexer tree or by simple relational |
| cells that compare the state value to a constant (usually {\tt \$eq} cells). |
| \end{itemize} |
| |
| This heuristic has proven to work very well. It is possible to overwrite it by setting |
| \B{fsm\_encoding}{\tt = "auto"} on registers that should be considered FSM state registers |
| and setting \B{fsm\_encoding}{\tt = "none"} on registers that match the above criteria |
| but should not be considered FSM state registers. |
| |
| Note however that marking state registers with \B{fsm\_encoding} that are not |
| suitable for FSM recoding can cause synthesis to fail or produce invalid |
| results. |
| |
| \subsection{FSM Extraction} |
| |
| The {\tt fsm\_extract} pass operates on all state signals marked with the |
| \B{fsm\_encoding} ({\tt != "none"}) attribute. For each state signal the following |
| information is determined: |
| |
| \begin{itemize} |
| \item The state registers |
| \item The asynchronous reset state if the state registers use asynchronous reset |
| \item All states and the control input signals used in the state transition functions |
| \item The control output signals calculated from the state signals and control inputs |
| \item A table of all state transitions and corresponding control inputs- and outputs |
| \end{itemize} |
| |
| The state registers (and asynchronous reset state, if applicable) is simply determined |
| by identifying the driver for the state signal. |
| |
| From there the {\tt \$mux}-tree driving the state register inputs is |
| recursively traversed. All select inputs are control signals and the leaves of the |
| {\tt \$mux}-tree are the states. The algorithm fails if a non-constant leaf |
| that is not the state signal itself is found. |
| |
| The list of control outputs is initialized with the bits from the state signal. |
| It is then extended by adding all values that are calculated by cells that |
| compare the state signal with a constant value. |
| |
| In most cases this will cover all uses of the state register, thus rendering the |
| state encoding arbitrary. If however a design uses e.g.~a single bit of the state |
| value to drive a control output directly, this bit of the state signal will be |
| transformed to a control output of the same value. |
| |
| Finally, a transition table for the FSM is generated. This is done by using the |
| {\tt ConstEval} C++ helper class (defined in {\tt kernel/consteval.h}) that can |
| be used to evaluate parts of the design. The {\tt ConstEval} class can be asked |
| to calculate a given set of result signals using a set of signal-value |
| assignments. It can also be passed a list of stop-signals that abort the {\tt |
| ConstEval} algorithm if the value of a stop-signal is needed in order to |
| calculate the result signals. |
| |
| The {\tt fsm\_extract} pass uses the {\tt ConstEval} class in the following way |
| to create a transition table. For each state: |
| |
| \begin{enumerate} |
| \item Create a {\tt ConstEval} object for the module containing the FSM |
| \item Add all control inputs to the list of stop signals |
| \item Set the state signal to the current state |
| \item Try to evaluate the next state and control output \label{enum:fsm_extract_cealg_try} |
| \item If step~\ref{enum:fsm_extract_cealg_try} was not successful: |
| \begin{itemize} |
| \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 0. |
| \item Recursively goto step~\ref{enum:fsm_extract_cealg_try} with the offending stop-signal set to 1. |
| \end{itemize} |
| \item If step~\ref{enum:fsm_extract_cealg_try} was successful: Emit transition |
| \end{enumerate} |
| |
| Finally a {\tt \$fsm} cell is created with the generated transition table and added to the |
| module. This new cell is connected to the control signals and the old drivers for the |
| control outputs are disconnected. |
| |
| \subsection{FSM Optimization} |
| |
| The {\tt fsm\_opt} pass performs basic optimizations on {\tt \$fsm} cells (not including state |
| recoding). The following optimizations are performed (in this order): |
| |
| \begin{itemize} |
| \item Unused control outputs are removed from the {\tt \$fsm} cell. The attribute \B{unused\_bits} |
| (that is usually set by the {\tt opt\_clean} pass) is used to determine which control |
| outputs are unused. |
| \item Control inputs that are connected to the same driver are merged. |
| \item When a control input is driven by a control output, the control input is removed and the transition |
| table altered to give the same performance without the external feedback path. |
| \item Entries in the transition table that yield the same output and only |
| differ in the value of a single control input bit are merged and the different bit is removed |
| from the sensitivity list (turned into a don't-care bit). |
| \item Constant inputs are removed and the transition table is altered to give an unchanged behaviour. |
| \item Unused inputs are removed. |
| \end{itemize} |
| |
| \subsection{FSM Recoding} |
| |
| The {\tt fsm\_recode} pass assigns new bit pattern to the states. Usually this |
| also implies a change in the width of the state signal. At the moment of this |
| writing only one-hot encoding with all-zero for the reset state is supported. |
| |
| The {\tt fsm\_recode} pass can also write a text file with the changes performed |
| by it that can be used when verifying designs synthesized by Yosys using Synopsys |
| Formality \citeweblink{Formality}. |
| |
| \section{Logic Optimization} |
| |
| Yosys can perform multi-level combinational logic optimization on gate-level netlists using the |
| external program ABC \citeweblink{ABC}. The {\tt abc} pass extracts the combinational gate-level |
| parts of the design, passes it through ABC, and re-integrates the results. The {\tt abc} pass |
| can also be used to perform other operations using ABC, such as technology mapping (see |
| Sec.~\ref{sec:techmap_extern} for details). |
| |