blob: b45c976637add8932818593aa42ba46e4aea5f62 [file] [log] [blame] [view]
# Black Parrot Front End
The Front End (FE) consists of two different stages: PC Generation (PC_GEN) and Instruction Fetch
(IF). These two stages receive the same PC value at the beginning of the stage; as a result, we
consider them as one stage. PC_GEN specifies the next virtual PC value that will be fetched. The
IF stage receives the virtual PC from the PC_GEN, fetches the next instruction and enqueues the
fetched PC/Instruction pair alongside branch prediction metadata into the frontend queue.
The FE of the BlackParrot processor contains the following components:
1. PC\_GEN (including the PC\_GEN logic and Branch Prediction Logic)
2. Instruction TLB (I-TLB)
3. Instruction Cache (I-Cache)
Note that in the current implementation, the I-TLB is a pass-through module.
## Table of Contents
* [Black Parrot Front End](#Black-Parrot-Front-End)
1. [PC Generation](#PC-Generation)
2. [I-Cache](#I-Cache)
## PC Generation
*pc\_gen.v* provides the interfaces for the PC\_GEN logics and also interfacing other modules in the
FE. PC\_GEN provides the pc for the I-TLB and I-Cache. PC\_GEN also provides the BTB, BHT and
RAS indexes for the Back End (BE) (the queue between the FE and the FE, i.e. the frontend queue).
### Parameter
* __vaddr\_width\_p__ - the width of the virtual addresses
* __paddr\_width\_p__ - the width of the physical addresses
* __eaddr\_width\_p__ - the width of the effective addresses
* __btb\_indx\_width\_p__ - the width of the BTB indexes
* __bht\_indx\_width\_p__ - the width of the BHT indexes
* __ras\_addr\_width\_p__ - the width of the RAS addresses
* __instr\_width\_p__ - the instruction width
* __asid\_width\_p__ - the width of ASID
* __bp\_first\_pc_p__ - the first pc when the processor boots up
### Interfaces
* __pc\_gen\_icache\_o__ (ready -> valid)
* __icache\_pc\_gen\_i__ (always ready)
* __icache\_miss\_i__ (icache miss signal)
* __pc\_gen\_itlb\_o__ (valid -> ready)
* __pc\_gen\_fe\_o__ (ready -> valid)
* __fe\_pc\_gen\_i__ (valid -> ready)
### Branch Target Buffer (BTB)
Branch Target Buffer (BTB) stores the addresses of the branch targets and the corresponding branch
sites. Branch happens from the branch sites to the branch targets. In order to save the logic
sizes, the BTB is designed to have limited entries for storing the branch sites, branch target
pairs. The implementation uses the bsg\_mem\_1r1w RAM design.
#### Parameters
* __bp\_fe\_pc\_gen\_btb\_idx\_width\_lp__ - the number of bits to index the BTB
* __eaddr\_width\_p__ - the width of the effective address
#### Interfaces
* __btb\_idx\_w\_i__ - the bit to index the BTB writing (for updating the BTB)
* __btb\_idx\_r\_i__ - the bit to index the BTB reading (for prediction)
* __btb\_r\_i__ - the bit to enable the BTB reading (for updating the BTB)
* __btb\_w\_i__ - the bit to enable the BTB writing (for prediction)
* __branch\_target\_i__ - the input for the BTB entry (for updating the BTB)
* __branch\_target\_o__ - the output target (for the prediction)
* __read\_valid\_o__ - the bit that tells whether the BTB entry has been written before reading
### Branch History Table (BHT)
Branch History Table (BHT) records a history of branch prediction results, and predict whether next
branch should be taken or not. After each prediction, the back-end (BE) informs the front-end (FE)
whether the previous prediction is correct or not. The BHT will update the corresponding entry
according to the previous results. The two bits in each entry of the BHT follows the rule in the
table.
| Bit 1 | Bit 0 |
|--------------------|---------------|
| taken or not taken | strong or weak|
1. If the entry in the BHT shows that the previous prediction is strong, and the BE informs the FE
that the previous prediction is correct, the BHT does not update any of the entry.
2. If the entry in the BHT indicates that the previous prediction is weak, and at the same time, the
BE informs the FE that the previous prediction is correct, the BHT will update its entry to make
it strong prediction.
3. If the entry in the BHT shows that the previous prediction is strong, but the BE informs the FE
that the previous prediction is wrong, the BHT changes the strong prediction to weak prediction
in the corresponding entry.
4. If the entry in the BHT indicates that the previous prediction is weak and at the same time, the
BE informs the FE that the previous prediction is wrong, the BHT will change the prediction either
from taken to not taken, or from not taken to taken, in the corresponding entry.
During the branch prediction, the FE reads the corresponding entry taken or not taken bit (Bit 1) to
predict. If the Bit 1 is 1, then the FE take the branch prediction. If the Bit 1 is 0, the FE does
not take the branch prediction.
#### Parameters
* __bht\_indx\_width\_p__ - the number of bits used to index the BHT
* __els\_lp__ - the number of entries in the BHT
#### Interfaces
* __bht\_idx\_r\_i__ - the index used for reading the BHT (for prediction)
* __bht\_idx\_w\_i__ - the index used for writing the BHT (for updating the BHT)
* __bht\_r\_i__ - the read enable bit
* __bht\_w\_i__ - the write enable bit
* __correct\_i__ - the bit used to indicate correct/wrong prediction (for updating the BHT entry)
* __predict\_o__ - the bit used to predict next branch
## I-Cache
The I-Cache (I$) is implemented as a virtually-indexed physically-tagged cache. The I-Cache module
consists of two components: cache logic and Local Cache Engine (LCE). The cache logic is a
two-staged pipelined cache (consisting of Tag-Lookup (TL) stage and Tag-Verify (TV) stage) and the
LCE is the cache entity participating in coherence.
The file *bp\_fe\_icache.v* defines the top level I-Cache module. This module is instantiated once
per Black Parrot multi-core processor. This module implements the cache logic and instantiates the
LCE module.
### Parameters
* __eaddr\_width\_p__ \- effective address width
* __data\_width\_p__ \- data width
* __instr\_width\_p__ \- instruction width
* __tag\_width\_p__ \- tag width
* __num\_cce\_p__ \- number of CCEs in the system
* __num\_lce\_p__ \- number of LCEs in the system
* __lce\_id\_p__ \- ID of this LCE in the system
* __lce\_assoc\_p__ \- Associativity of this LCE
* __lce\_sets\_p__ \- Number of sets in this LCE
* __lce\_states\_p__ \- Number of coherency states for the LCE
* __block\_size\_in\_bytes\_p__ \- The cache line (block) size in bytes
### Interfaces
The I-Cache receives the virtual pc from the PC_GEN and physical pc from I-TLB; then, it responds to
PC_GEN by fetching the corresponding instruction. In case of a cache miss, the LCE sends a request
to the CCE and waits for a response. Additionally, LCE handles other incoming messages from the
CCE.
* __pc_gen_icache_vaddr__ \- valid->ready
* __icache_pc_gen_data__ \- valid
* __itlb_icache_data_resp__ \- valid->ready
* __LCE to CCE__ \- ready->valid
* __CCE to LCE__ \- ready->valid
* __LCE to LCE (inbound)__ \- ready->valid
* __LCE to LCE (outbound)__ \- ready->valid
#### Local Cache Engine (LCE)
The file *bp\_fe\_lce.v* defines the top level LCE module.
### Parameters
* __eaddr\_width\_p__ \- effective address width
* __data\_width\_p__ \- data width
* __instr\_width\_p__ \- instruction width
* __tag\_width\_p__ \- tag width
* __num\_cce\_p__ \- number of CCEs in the system
* __num\_lce\_p__ \- number of LCEs in the system
* __lce\_id\_p__ \- ID of this LCE in the system
* __lce\_assoc\_p__ \- Associativity of this LCE
* __lce\_sets\_p__ \- Number of sets in this LCE
* __lce\_states\_p__ \- Number of coherency states for the LCE
* __block\_size\_in\_bytes\_p__ \- The cache line (block) size in bytes
### Interfaces
In addition to the LCE and CCE messages, the LCE interface consists of the data_mem, tag_mem, and
meta_data_mem packets sent to the cache.
* __data_mem_pkt__ \- valid->yumi
* __tag_mem_pkt__ \- valid->yumi
* __meta_data_mem_pkt__ \- valid->yumi
* __LCE to CCE__ \- ready->valid
* __CCE to LCE__ \- ready->valid
* __LCE to LCE (inbound)__ \- ready->valid
* __LCE to LCE (outbound)__ \- ready->valid