| # Commit Procedures |
| |
| ## For external developers |
| See [Submitting Code to VTR](CONTRIBUTING.md#submitting-code-to-vtr). |
| |
| ## For developers with commit rights |
| The guiding principle in internal development is to submit your work into the repository without breaking other people's work. |
| When you commit, make sure that the repository compiles, that the flow runs, and that you did not clobber someone else's work. |
| In the event that you are responsible for "breaking the build", fix the build at top priority. |
| |
| We have some guidelines in place to help catch most of these problems: |
| |
| 1. Before you push code to the central repository, your code MUST pass the check-in regression test. |
| The check-in regression test is a quick way to test if any part of the VTR flow is broken. |
| |
| At a minimum you must run: |
| ```shell |
| #From the VTR root directory |
| $ ./run_reg_test.pl vtr_reg_basic |
| ``` |
| You may push if all the tests return `All tests passed`. |
| |
| However you are strongly encouraged to run both the *basic* and *strong* regression tests: |
| ```shell |
| #From the VTR root directory |
| $ ./run_reg_test.pl vtr_reg_basic vtr_reg_strong |
| ``` |
| since it performs much more thorough testing. |
| |
| It is typically a good idea to run tests regularily as you make changes. |
| If you have failures see [how to debugging failed tests](#debugging-failed-tests). |
| |
| 2. The automated [BuildBot](http://builds.verilogtorouting.org:8080/waterfall) will perform more extensive regressions tests and mark which revisions are stable. |
| |
| 3. Everyone who is doing development must write regression tests for major feature they create. |
| This ensures regression testing will detect if a feature is broken by someone (or yourself). |
| See [Adding Tests](#adding-tests) for details. |
| |
| 4. In the event a regression test is broken, the one responsible for having the test pass is in charge of determining: |
| * If there is a bug in the source code, in which case the source code needs to be updated to fix the bug, or |
| * If there is a problem with the test (perhaps the quality of the tool did in fact get better or perhaps there is a bug with the test itself), in which case the test needs to be updated to reflect the new changes. |
| |
| If the golden results need to be updated and you are sure that the new golden results are better, use the command `../scripts/parse_vtr_task.pl -create_golden your_regression_test_name_here` |
| |
| 5. Keep in sync with the master branch as regularly as you can (i.e. `git pull` or `git pull --rebase`). |
| The longer code deviates from the trunk, the more painful it is to integrate back into the trunk. |
| |
| Whatever system that we come up with will not be foolproof so be conscientious about how your changes will effect other developers. |
| |
| # Code Formatting |
| |
| Some parts of the VTR code base (e.g. VPR, libarchfpga, libvtrutil) have code formatting requirements which are checked automatically by regression tests. |
| If your code changes are not compliant with the formatting, you can run: |
| ```shell |
| make format |
| ``` |
| from the root of the VTR source tree. |
| This will automatically reformat your code to be compliant with formatting requirements (this requires the `clang-format` tool to be available on your system). |
| |
| ## Large Scale Reformatting |
| |
| For large scale reformatting (should only be performed by VTR maintainers) the script `dev/autoformat.py` can be used to reformat the code and commit it as 'VTR Robot', which keeps the revision history clearer and records metadata about reformatting commits (which allows `git hyper-blame` to skip such commits). |
| |
| # Running Tests |
| |
| VTR has a variety of tests which are used to check for correctness, performance and Quality of Result (QoR). |
| |
| There are 4 main regression tests: |
| |
| * `vtr_reg_basic`: ~1 minute serial |
| |
| **Goal:** Fast functionality check |
| |
| **Feature Coverage:** Low |
| |
| **Benchmarks:** A few small and simple circuits |
| |
| **Architectures:** A few simple architectures |
| |
| This regression test is *not* suitable for evaluating QoR or performance. |
| It's primary purpose is to make sure the various tools do not crash/fail in the basic VTR flow. |
| |
| QoR checks in this regression test are primarily 'canary' checks to catch gross degredations in QoR. |
| Ocassionally, code changes can cause QoR failures (e.g. due to CAD noise -- particularly on small benchmarks); usually such failures are not a concern if the QoR differences are small. |
| |
| * `vtr_reg_strong`: ~20 minutes serial, ~15 minutes with `-j4` |
| |
| **Goal:** Broad functionaly check |
| |
| **Feature Coverage:** High |
| |
| **Benchmarks:** A few small circuits, with some special benchmarks to exercise specific features |
| |
| **Architectures:** A variety of architectures, including special architectures to exercise specific features |
| |
| This regression test is *not* suitable for evaluating QoR or performance. |
| It's primary purpose is try and achieve high functionality coverage. |
| |
| QoR checks in this regression test are primarily 'canary' checks to catch gross degredations in QoR. |
| Ocassionally, changes can cause QoR failures (e.g. due to CAD noise -- particularly on small benchmarks); usually such failures are not a concern if the QoR differences are small. |
| |
| * `vtr_reg_nightly`: ~6 hours with `-j3` |
| |
| **Goal:** Basic QoR and Performance evaluation. |
| |
| **Feature Coverage:** Medium |
| |
| **Benchmarks:** Small-medium size, diverse. Includes: |
| |
| * MCNC20 benchmarks |
| * VTR benchmarks |
| * Titan 'other' benchmarks (smaller than Titan23) |
| |
| **Architectures:** A wider variety of architectures |
| |
| QoR checks in this regression are aimed at evaluating quality and run-time of the VTR flow. |
| As a result any QoR failures are a concern and should be investigated and understood. |
| |
| * `vtr_reg_weekly`: ~42 hours with `-j4` |
| |
| **Goal:** Full QoR and Performance evaluation. |
| |
| **Feature Coverage:** Medium |
| |
| **Benchmarks:** Medium-Large size, diverse. Includes: |
| |
| * VTR benchmarks |
| * Titan23 benchmarks |
| |
| **Architectures:** A wide variety of architectures |
| |
| QoR checks in this regression are aimed at evaluating quality and run-time of the VTR flow. |
| As a result any QoR failures are a concern and should be investigated and understood. |
| |
| These can be run with `run_reg_test.pl`: |
| ```shell |
| #From the VTR root directory |
| $ ./run_reg_test.pl vtr_reg_basic |
| $ ./run_reg_test.pl vtr_reg_strong |
| ``` |
| |
| The *nightly* and *weekly* regressions require the Titan benchmarks which can be integrated into your VTR tree with: |
| ```shell |
| make get_titan_benchmarks |
| ``` |
| They can then be run using `run_reg_test.pl`: |
| ```shell |
| $ ./run_reg_test.pl vtr_reg_nightly |
| $ ./run_reg_test.pl vtr_reg_weekly |
| ``` |
| |
| To speed-up things up, individual sub-tests can be run in parallel using the `-j` option: |
| ```shell |
| #Run up to 4 tests in parallel |
| $ ./run_reg_test.pl vtr_reg_strong -j4 |
| ``` |
| |
| You can also run multiple regression tests together: |
| ```shell |
| #Run both the basic and strong regression, with up to 4 tests in parallel |
| $ ./run_reg_test.pl vtr_reg_basic vtr_reg_strong -j4 |
| ``` |
| |
| ## Odin Functionality Tests |
| |
| Odin has its own set of tests to verify the correctness of its synthesis results: |
| |
| * `odin_reg_micro`: ~2 minutes serial |
| * `odin_reg_full`: ~6 minutes serial |
| |
| These can be run with: |
| ```shell |
| #From the VTR root directory |
| $ ./run_reg_test.pl odin_reg_micro |
| $ ./run_reg_test.pl odin_reg_full |
| ``` |
| and should be used when makeing changes to Odin. |
| |
| ## Unit Tests |
| |
| VTR also has a limited set of unit tests, which can be run with: |
| ```shell |
| #From the VTR root directory |
| $ make && make test |
| ``` |
| |
| # Debugging Failed Tests |
| |
| If a test fails you probably want to look at the log files to determine the cause. |
| |
| Lets assume we have a failure in `vtr_reg_basic`: |
| |
| ```shell |
| #In the VTR root directory |
| $ ./run_reg_test.pl vtr_reg_strong |
| #Output trimmed... |
| regression_tests/vtr_reg_basic/basic_no_timing |
| ----------------------------------------- |
| k4_N10_memSize16384_memData64/ch_intrinsics/common failed: vpr |
| k4_N10_memSize16384_memData64/diffeq1/common failed: vpr |
| #Output trimmed... |
| regression_tests/vtr_reg_basic/basic_no_timing...[Fail] |
| k4_N10_memSize16384_memData64.xml/ch_intrinsics.v vpr_status: golden = success result = exited |
| #Output trimmed... |
| Error: 10 tests failed! |
| ``` |
| |
| Here we can see that `vpr` failed, which caused subsequent QoR failures (`[Fail]`), and resulted in 10 total errors. |
| |
| To see the log files we need to find the run directory. |
| We can see from the output that the specific test which failed was `regression_tests/vtr_reg_basic/basic_no_timing`. |
| All the regression tests take place under `vtr_flow/tasks`, so the test directory is `vtr_flow/tasks/regression_tests/vtr_reg_basic/basic_no_timing`. |
| Lets move to that directory: |
| ```shell |
| #From the VTR root directory |
| $ cd vtr_flow/tasks/regression_tests/vtr_reg_basic/basic_no_timing |
| $ ls |
| config run001 run003 |
| latest run002 run004 run005 |
| ``` |
| |
| There we see there is a `config` directory (which defines the test), and a set of run-directories. |
| Each time a test is run it creates a new `runXXX` directory (where `XXX` is an incrementing number). |
| From the above we can tell that our last run was `run005` (the symbolic link `latest` also points to the most recent run directory). |
| From the output of `run_reg_test.pl` we know that one of the failing architecture/circuit/parameters combinations was `k4_N10_memSize16384_memData64/ch_intrinsics/common`. |
| Each architecture/circuit/parameter combination is run in its own sub-folder. |
| Lets move to that directory: |
| ```shell |
| $ cd run005/k4_N10_memSize16384_memData64/ch_intrinsics/common |
| $ ls |
| abc.out k4_N10_memSize16384_memData64.xml qor_results.txt |
| ch_intrinsics.net odin.out thread_1.out |
| ch_intrinsics.place output.log vpr.out |
| ch_intrinsics.pre-vpr.blif output.txt vpr_stdout.log |
| ch_intrinsics.route parse_results.txt |
| ``` |
| |
| Here we can see the individual log files produced by each tool (e.g. `vpr.out`), which we can use to guide our debugging. |
| We could also manually re-run the tools (e.g. with a debugger) using files in this directory. |
| |
| # Evaluating Quality of Result (QoR) Changes |
| VTR uses highly tuned and optimized algorithms and data structures. |
| Changes which effect these can have significant impacts on the quality of VTR's design implementations (timing, area etc.) and VTR's run-time/memory usage. |
| Such changes need to be evaluated carefully before they are pushed/merged to ensure no quality degredation occurs. |
| |
| If you are unsure of what level of QoR evaluation is neccessary for your changes, please ask a VTR developer for guidance. |
| |
| ## General QoR Evaluation Principles |
| The goal of performing a QoR evaluation is to measure precisely the impact of a set of code/architecture/benchmark changes on both the quality of VTR's design implemenation (i.e. the result of VTR's optimizations), and on tool run-time and memory usage. |
| |
| This process is made more challenging by the fact that many of VTR's optimization algorithms are based on heuristics (some of which depend on randomization). |
| This means that VTR's implementation results are dependent upon: |
| * The initial conditions (e.g. input architecture & netlist, random number generator seed), and |
| * The precise optimization algorithms used. |
| |
| The result is that a minor change to either of these can can make the measured QoR change. |
| This effect can be viewed as an intrinsic 'noise' or 'variance' to any QoR measurement for a particular architecture/benchmark/algorithm combination. |
| |
| There are typically two key methods used to measure the 'true' QoR: |
| |
| 1. Averaging metrics accross multiple architectures and benchmark circuits. |
| |
| 2. Averaging metrics multiple runs of the same architecture and benchmark, but using different random number generator seeds |
| |
| This is a further variance reduction technique, although it can be very CPU-time intensive. |
| A typical example would be to sweep an entire benchmark set accross 3 or 5 different seeds. |
| |
| In practise any algorithm changes will likely cause improvements on some architecture/benchmark combinations, and degredations on others. |
| As a result we primarily focus on the *average* behaviour of a change to evaluate its impact. |
| However extreme outlier behaviour on particular circuits is also important, since it may indicate bugs or other unexpected behaviour. |
| |
| ### Key QoR Metrics |
| |
| The following are key QoR metrics which should be used to evaluate the impact of changes in VTR. |
| |
| Implementation Quality Metrics: |
| |
| | Metric | Meaning | Sensitivity | |
| |-----------------------------|--------------------------------------------------------------------------|-------------| |
| | num_pre_packed_blocks | Number of primitive netlist blocks (after tech. mapping, before packing) | Low | |
| | num_post_packed_blocks | Number of Clustered Blocks (after packing) | Medium | |
| | device_grid_tiles | FPGA size in grid tiles | Low-Medium | |
| | min_chan_width | The minimum routable channel width | Medium\* | |
| | crit_path_routed_wirelength | The routed wirelength at the relaxed channel width | Medium | |
| | critical_path_delay | The critical path delay at the relaxed channel width | Medium-High | |
| |
| \* By default, VPR attempts to find the minimum routable channel width; it then performs routing at a relaxed (e.g. 1.3x minimum) channel width. At minimum channel width routing congestion can distort the true timing/wirelength characteristics. Combined with the fact that most FPGA architectures are built with an abundance of routing, post-routing metrics are usually only evaluated at the relaxed channel width. |
| |
| Run-time/Memory Usage Metrics: |
| |
| | Metric | Meaning | Sensitivity | |
| |-----------------------------|---------------------------------------------------------------------------|-------------| |
| | vtr_flow_elapsed_time | Wall-clock time to complete the VTR flow | Low | |
| | pack_time | Wall-clock time VPR spent during packing | Low | |
| | place_time | Wall-clock time VPR spent during placement | Low | |
| | min_chan_width_route_time | Wall-clock time VPR spent during routing at the relaxed channel width | High\* | |
| | crit_path_route_time | Wall-clock time VPR spent during routing at the relaxed channel width | Low | |
| | max_vpr_mem | Maximum memory used by VPR (in kilobytes) | Low | |
| |
| \* Note that the minimum channel width route time is chaotic and can be highly variable (e.g. 10x variation is not unusual). Minimum channel width routing performs a binary search to find the minimum channel width. Since route time is highly dependent on congestion, run-time is highly dependent on the precise channel widths searched (which may change due to perturbations). |
| |
| In practise you will likely want to consider additional and more detailed metrics, particularly those directly related to the changes you are making. |
| For example, if your change related to hold-time optimization you would want to include hold-time related metrics such as `hold_TNS` (hold total negative slack) and `hold_WNS` (hold worst negative slack). |
| If your change related to packing, you would want to report additional packing-related metrics, such as the number of clusters formed by each block type (e.g. numbers of CLBs, RAMs, DSPs, IOs). |
| |
| ### Benchmark Selection |
| |
| An important factor in performing any QoR evaluation is the benchmark set selected. |
| In order to draw reasonably general conclusions about the impact of a change we desire two characteristics of the benchmark set: |
| |
| 1. It includes a large number of benchmarks which are representative of the application domains of interest. |
| |
| This ensures we don't over-tune to a specific benchmark or application domain. |
| |
| 2. It should include benchmarks of large sizes. |
| |
| This ensures we can optimize and scale to large problem spaces. |
| |
| In practise (for various reasons) satisfying both of these goals simultaneously is challenging. |
| The key goal here is to ensure the benchmark set is not unreasonably biased in some manner (e.g. benchmarks which are too small, benchmarks too skewed to a particular application domain). |
| |
| ### Fairly measuring tool run-time |
| Accurately and fairly measuring the run-time of computer programs is challenging in practise. |
| A variety of factors effect run-time including: |
| |
| * Operating System |
| * System load (e.g. other programs running) |
| * Variance in hardware performance (e.g. different CPUs on different machines, CPU frequency scaling) |
| |
| To make reasonably 'fair' run-time comparisons it is important to isolate the change as much as possible from other factors. |
| This involves keeping as much of the experimental environment identical as possible including: |
| |
| 1. Target benchmarks |
| 2. Target architecture |
| 3. Code base (e.g. VTR revision) |
| 4. CAD parameters |
| 5. Computer system (e.g. CPU model, CPU frequency/power scaling, OS version) |
| 6. Compiler version |
| |
| ## Collecting QoR Measurements |
| The first step is to collect QoR metrics on your selected benchmark set. |
| |
| You need at least two sets of QoR measurements: |
| 1. The baseline QoR (i.e. unmodified VTR). |
| 2. The modified QoR (i.e. VTR with your changes). |
| |
| Note that it is important to generate both sets of QoR measurements on the same computing infrastructure to ensure a fair run-time comparison. |
| |
| The following examples show how a single set of QoR measurements can be produced using the VTR flow infrastructure. |
| |
| ### Example: VTR Benchmarks QoR Measurement |
| |
| The VTR benchmarks are a group of benchmark circuits distributed with the VTR project. |
| The are provided as synthesizable verilog and can be re-mapped to VTR supported architectures. |
| They consist mostly of small to medium sized circuits from a mix of application domains. |
| They are used primarily to evaluate the VTR's optimization quality in an architecture exploration/evaluation setting (e.g. determining minimum channel widths). |
| |
| A typical approach to evaluating an algorithm change would be to run `vtr_reg_qor_chain` task from the nightly regression test: |
| |
| ```shell |
| #From the VTR root |
| $ cd vtr_flow/tasks |
| |
| #Run the VTR benchmarks |
| $ ../scripts/run_vtr_task.pl regression_tests/vtr_reg_nightly/vtr_reg_qor_chain |
| |
| #Several hours later... they complete |
| |
| #Parse the results |
| $ ../scripts/parse_vtr_task.pl regression_tests/vtr_reg_nightly/vtr_reg_qor_chain |
| |
| #The run directory should now contain a summary parse_results.txt file |
| $ head -5 vtr_reg_nightly/vtr_reg_qor_chain/latest/parse_results.txt |
| arch circuit script_params vpr_revision vpr_status error num_pre_packed_nets num_pre_packed_blocks num_post_packed_nets num_post_packed_blocks device_width device_height num_clb num_io num_outputs num_memoriesnum_mult placed_wirelength_est placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est min_chan_width routed_wirelength min_chan_width_route_success_iteration crit_path_routed_wirelength crit_path_route_success_iteration critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile crit_path_routing_area_total crit_path_routing_area_per_tile odin_synth_time abc_synth_time abc_cec_time abc_sec_time ace_time pack_time place_time min_chan_width_route_time crit_path_route_time vtr_flow_elapsed_time max_vpr_mem max_odin_mem max_abc_mem |
| k6_frac_N10_frac_chain_mem32K_40nm.xml bgm.v common 9f591f6-dirty success 26431 24575 14738 2258 53 53 1958 257 32 0 11 871090 18.5121 -13652.6 -18.5121 84 328781 32 297718 18 20.4406 -15027.8 -20.4406 0 0 1.70873e+08 1.09883e+08 1.63166e+07 5595.54 2.07456e+07 7114.41 11.16 1.03 -1 -1 -1 141.53 108.26 142.42 15.63 652.17 1329712 528868 146796 |
| k6_frac_N10_frac_chain_mem32K_40nm.xml blob_merge.v common 9f591f6-dirty success 14163 11407 3445 700 30 30 564 36 100 0 0 113369 13.4111 -2338.12 -13.4111 64 80075 18 75615 23 15.3479 -2659.17 -15.3479 0 0 4.8774e+07 3.03962e+07 3.87092e+06 4301.02 4.83441e+06 5371.56 0.46 0.17 -1 -1 -1 67.89 11.30 47.60 3.48 198.58 307756 48148 58104 |
| k6_frac_N10_frac_chain_mem32K_40nm.xml boundtop.v common 9f591f6-dirty success 1071 1141 595 389 13 13 55 142 192 0 0 5360 3.2524 -466.039 -3.2524 34 4534 15 3767 12 3.96224 -559.389 -3.96224 0 0 6.63067e+06 2.96417e+06 353000. 2088.76 434699. 2572.18 0.29 0.11 -1 -1 -1 2.55 0.82 2.10 0.15 7.24 87552 38484 37384 |
| k6_frac_N10_frac_chain_mem32K_40nm.xml ch_intrinsics.v common 9f591f6-dirty success 363 493 270 247 10 10 17 99 130 1 0 1792 1.86527 -194.602 -1.86527 46 1562 13 1438 20 2.4542 -226.033 -2.4542 0 0 3.92691e+06 1.4642e+06 259806. 2598.06 333135. 3331.35 0.03 0.01 -1 -1 -1 0.46 0.31 0.94 0.09 2.59 62684 8672 32940 |
| ``` |
| |
| ### Example: Titan Benchmarks QoR Measurements |
| |
| The Titan benchmarks are a group of large benchmark circuits from a wide range of applications, which are compatible with the VTR project. |
| The are typically used as post-technology mapped netlists which have been pre-synthesized with Quartus. |
| They are substantially larger and more realistic than the VTR benchmarks, but can only target specificly compatible architectures. |
| They are used primarily to evaluate the optimization quality and scalability of VTR's CAD algorithms while targetting a fixed architecture (e.g. at a fixed channel width). |
| |
| A typical approach to evaluating an algorithm change would be to run `vtr_reg_titan` task from the weekly regression test: |
| |
| ```shell |
| #From the VTR root |
| |
| #Download and integrate the Titan benchmarks into the VTR source tree |
| $ make get_titan_benchmarks |
| |
| #Move to the task directory |
| $ cd vtr_flow/tasks |
| |
| #Run the VTR benchmarks |
| $ ../scripts/run_vtr_task.pl regression_tests/vtr_reg_weekly/vtr_reg_titan |
| |
| #Several days later... they complete |
| |
| #Parse the results |
| $ ../scripts/parse_vtr_task.pl regression_tests/vtr_reg_weekly/vtr_reg_titan |
| |
| #The run directory should now contain a summary parse_results.txt file |
| $ head -5 vtr_reg_nightly/vtr_reg_qor_chain/latest/parse_results.txt |
| arch circuit vpr_revision vpr_status error num_pre_packed_nets num_pre_packed_blocks num_post_packed_nets num_post_packed_blocks device_width device_height num_clb num_io num_outputs num_memoriesnum_mult placed_wirelength_est placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est routed_wirelength crit_path_route_success_iteration logic_block_area_total logic_block_area_used routing_area_total routing_area_per_tile critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS pack_time place_time crit_path_route_time max_vpr_mem max_odin_mem max_abc_mem |
| stratixiv_arch.timing.xml neuron_stratixiv_arch_timing.blif 0208312 success 119888 86875 51408 3370 128 95 -1 42 35 -1 -1 3985635 8.70971 -234032 -8.70971 1086419 20 0 0 2.66512e+08 21917.1 9.64877 -262034 -9.64877 0 0 127.92 218.48 259.96 5133800 -1 -1 |
| stratixiv_arch.timing.xml sparcT1_core_stratixiv_arch_timing.blif 0208312 success 92813 91974 54564 4170 77 57 -1 173 137 -1 -1 3213593 7.87734 -534295 -7.87734 1527941 43 0 0 9.64428e+07 21973.8 9.06977 -625483 -9.06977 0 0 327.38 338.65 364.46 3690032 -1 -1 |
| stratixiv_arch.timing.xml stereo_vision_stratixiv_arch_timing.blif 0208312 success 127088 94088 62912 3776 128 95 -1 326 681 -1 -1 4875541 8.77339 -166097 -8.77339 998408 16 0 0 2.66512e+08 21917.1 9.36528 -187552 -9.36528 0 0 110.03 214.16 189.83 5048580 -1 -1 |
| stratixiv_arch.timing.xml cholesky_mc_stratixiv_arch_timing.blif 0208312 success 140214 108592 67410 5444 121 90 -1 111 151 -1 -1 5221059 8.16972 -454610 -8.16972 1518597 15 0 0 2.38657e+08 21915.3 9.34704 -531231 -9.34704 0 0 211.12 364.32 490.24 6356252 -1 -1 |
| ``` |
| |
| ## Comparing QoR Measurements |
| Once you have two (or more) sets of QoR measurements they now need to be compared. |
| |
| A general method is as follows: |
| 1. Normalize all metrics to the values in the baseline measurements (this makes the relative changes easy to evaluate) |
| 2. Produce tables for each set of QoR measurements showing the per-benchmark relative values for each metric |
| 3. Calculate the GEOMEAN over all benchmarks for each normalized metric |
| 4. Produce a summary table showing the Metric Geomeans for each set of QoR measurments |
| |
| ### QoR Comparison Gotchas |
| There are a variety of 'gotchas' you need to avoid to ensure fair comparisons: |
| * GEOMEAN's must be over the same set of benchmarks . |
| A common issue is that a benchmark failed to complete for some reason, and it's metric values are missing |
| |
| * Run-times need to be collected on the same compute infrastructure at the same system load (ideally unloaded). |
| |
| ### Example QoR Comparison |
| Suppose we've make a change to VTR, and we now want to evaluate the change. |
| As described above we produce QoR measurements for both the VTR baseline, and our modified version. |
| |
| We then have the following (hypothetical) QoR Metrics. |
| |
| **Baseline QoR Metrics:** |
| |
| | arch | circuit | num_pre_packed_blocks | num_post_packed_blocks | device_grid_tiles | min_chan_width | crit_path_routed_wirelength | critical_path_delay | vtr_flow_elapsed_time | pack_time | place_time | min_chan_width_route_time | crit_path_route_time | max_vpr_mem | |
| |----------------------------------------|--------------------|-----------------------|------------------------|-------------------|----------------|-----------------------------|---------------------|-----------------------|-----------|------------|---------------------------|----------------------|-------------| |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | bgm.v | 24575 | 2258 | 2809 | 84 | 297718 | 20.4406 | 652.17 | 141.53 | 108.26 | 142.42 | 15.63 | 1329712 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | blob_merge.v | 11407 | 700 | 900 | 64 | 75615 | 15.3479 | 198.58 | 67.89 | 11.3 | 47.6 | 3.48 | 307756 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | boundtop.v | 1141 | 389 | 169 | 34 | 3767 | 3.96224 | 7.24 | 2.55 | 0.82 | 2.1 | 0.15 | 87552 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | ch_intrinsics.v | 493 | 247 | 100 | 46 | 1438 | 2.4542 | 2.59 | 0.46 | 0.31 | 0.94 | 0.09 | 62684 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq1.v | 886 | 313 | 256 | 60 | 9624 | 17.9648 | 15.59 | 2.45 | 1.36 | 9.93 | 0.93 | 86524 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq2.v | 599 | 201 | 256 | 52 | 8928 | 13.7083 | 13.14 | 1.41 | 0.87 | 9.14 | 0.94 | 85760 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU8PEEng.v | 31396 | 2286 | 2916 | 100 | 348085 | 79.4512 | 1514.51 | 175.67 | 153.01 | 1009.08 | 45.47 | 1410872 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU32PEEng.v | 101542 | 7251 | 9216 | 158 | 1554942 | 80.062 | 28051.68 | 625.03 | 930.58 | 25050.73 | 251.87 | 4647936 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mcml.v | 165809 | 6767 | 8649 | 128 | 1311825 | 51.1905 | 9088.1 | 524.8 | 742.85 | 4001.03 | 127.42 | 4999124 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkDelayWorker32B.v | 4145 | 1327 | 2500 | 38 | 30086 | 8.39902 | 65.54 | 7.73 | 15.39 | 26.19 | 3.23 | 804720 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkPktMerge.v | 1160 | 516 | 784 | 44 | 13370 | 4.4408 | 21.75 | 2.45 | 2.14 | 13.95 | 1.96 | 122872 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkSMAdapter4B.v | 2852 | 548 | 400 | 48 | 19274 | 5.26765 | 47.64 | 16.22 | 4.16 | 19.95 | 1.14 | 116012 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | or1200.v | 4530 | 1321 | 729 | 62 | 51633 | 9.67406 | 105.62 | 33.37 | 12.93 | 44.95 | 3.33 | 219376 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | raygentop.v | 2934 | 710 | 361 | 58 | 22045 | 5.14713 | 39.72 | 9.54 | 4.06 | 19.8 | 2.34 | 126056 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | sha.v | 3024 | 236 | 289 | 62 | 16653 | 10.0144 | 390.89 | 11.47 | 2.7 | 6.18 | 0.75 | 117612 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision0.v | 21801 | 1122 | 1156 | 58 | 64935 | 3.63177 | 82.74 | 20.45 | 15.49 | 24.5 | 2.6 | 411884 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision1.v | 19538 | 1096 | 1600 | 100 | 143517 | 5.61925 | 272.41 | 26.99 | 18.15 | 149.46 | 15.49 | 676844 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision2.v | 42078 | 2534 | 7396 | 134 | 650583 | 15.3151 | 3664.98 | 66.72 | 119.26 | 3388.7 | 62.6 | 3114880 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision3.v | 324 | 55 | 49 | 30 | 768 | 2.66429 | 2.25 | 0.75 | 0.2 | 0.57 | 0.05 | 61148 | |
| |
| **Modified QoR Metrics:** |
| |
| | arch | circuit | num_pre_packed_blocks | num_post_packed_blocks | device_grid_tiles | min_chan_width | crit_path_routed_wirelength | critical_path_delay | vtr_flow_elapsed_time | pack_time | place_time | min_chan_width_route_time | crit_path_route_time | max_vpr_mem | |
| |----------------------------------------|--------------------|-----------------------|------------------------|-------------------|----------------|-----------------------------|---------------------|-----------------------|-----------|------------|---------------------------|----------------------|-------------| |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | bgm.v | 24575 | 2193 | 2809 | 82 | 303891 | 20.414 | 642.01 | 70.09 | 113.58 | 198.09 | 16.27 | 1222072 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | blob_merge.v | 11407 | 684 | 900 | 72 | 77261 | 14.6676 | 178.16 | 34.31 | 13.38 | 57.89 | 3.35 | 281468 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | boundtop.v | 1141 | 369 | 169 | 40 | 3465 | 3.5255 | 4.48 | 1.13 | 0.7 | 0.9 | 0.17 | 82912 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | ch_intrinsics.v | 493 | 241 | 100 | 54 | 1424 | 2.50601 | 1.75 | 0.19 | 0.27 | 0.43 | 0.09 | 60796 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq1.v | 886 | 293 | 256 | 50 | 9972 | 17.3124 | 15.24 | 0.69 | 0.97 | 11.27 | 1.44 | 72204 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq2.v | 599 | 187 | 256 | 50 | 7621 | 13.1714 | 14.14 | 0.63 | 1.04 | 10.93 | 0.78 | 68900 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU8PEEng.v | 31396 | 2236 | 2916 | 98 | 349074 | 77.8611 | 1269.26 | 88.44 | 153.25 | 843.31 | 49.13 | 1319276 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU32PEEng.v | 101542 | 6933 | 9216 | 176 | 1700697 | 80.1368 | 28290.01 | 306.21 | 897.95 | 25668.4 | 278.74 | 4224048 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mcml.v | 165809 | 6435 | 8649 | 124 | 1240060 | 45.6693 | 9384.4 | 296.99 | 686.27 | 4782.43 | 99.4 | 4370788 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkDelayWorker32B.v | 4145 | 1207 | 2500 | 36 | 33354 | 8.3986 | 53.94 | 3.85 | 14.75 | 19.53 | 2.95 | 785316 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkPktMerge.v | 1160 | 494 | 784 | 36 | 13881 | 4.57189 | 20.75 | 0.82 | 1.97 | 15.01 | 1.88 | 117636 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkSMAdapter4B.v | 2852 | 529 | 400 | 56 | 19817 | 5.21349 | 27.58 | 5.05 | 2.66 | 14.65 | 1.11 | 103060 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | or1200.v | 4530 | 1008 | 729 | 76 | 48034 | 8.70797 | 202.25 | 10.1 | 8.31 | 171.96 | 2.86 | 178712 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | raygentop.v | 2934 | 634 | 361 | 58 | 20799 | 5.04571 | 22.58 | 2.75 | 2.42 | 12.86 | 1.64 | 108116 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | sha.v | 3024 | 236 | 289 | 62 | 16052 | 10.5007 | 337.19 | 5.32 | 2.25 | 4.52 | 0.69 | 105948 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision0.v | 21801 | 1121 | 1156 | 58 | 70046 | 3.61684 | 86.5 | 9.5 | 15.02 | 41.81 | 2.59 | 376100 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision1.v | 19538 | 1080 | 1600 | 92 | 142805 | 6.02319 | 343.83 | 10.68 | 16.21 | 247.99 | 11.66 | 480352 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision2.v | 42078 | 2416 | 7396 | 124 | 646793 | 14.6606 | 5614.79 | 34.81 | 107.66 | 5383.58 | 62.27 | 2682976 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision3.v | 324 | 54 | 49 | 34 | 920 | 2.5281 | 1.55 | 0.31 | 0.14 | 0.43 | 0.05 | 63444 | |
| |
| Based on these metrics we then calculate the following ratios and summary. |
| |
| **QoR Metric Ratio** (Modified QoR / Baseline QoR): |
| |
| | arch | circuit | num_pre_packed_blocks | num_post_packed_blocks | device_grid_tiles | min_chan_width | crit_path_routed_wirelength | critical_path_delay | vtr_flow_elapsed_time | pack_time | place_time | min_chan_width_route_time | crit_path_route_time | max_vpr_mem | |
| |----------------------------------------|--------------------|-----------------------|------------------------|-------------------|----------------|-----------------------------|---------------------|-----------------------|-----------|------------|---------------------------|----------------------|-------------| |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | bgm.v | 1.00 | 0.97 | 1.00 | 0.98 | 1.02 | 1.00 | 0.98 | 0.50 | 1.05 | 1.39 | 1.04 | 0.92 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | blob_merge.v | 1.00 | 0.98 | 1.00 | 1.13 | 1.02 | 0.96 | 0.90 | 0.51 | 1.18 | 1.22 | 0.96 | 0.91 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | boundtop.v | 1.00 | 0.95 | 1.00 | 1.18 | 0.92 | 0.89 | 0.62 | 0.44 | 0.85 | 0.43 | 1.13 | 0.95 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | ch_intrinsics.v | 1.00 | 0.98 | 1.00 | 1.17 | 0.99 | 1.02 | 0.68 | 0.41 | 0.87 | 0.46 | 1.00 | 0.97 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq1.v | 1.00 | 0.94 | 1.00 | 0.83 | 1.04 | 0.96 | 0.98 | 0.28 | 0.71 | 1.13 | 1.55 | 0.83 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | diffeq2.v | 1.00 | 0.93 | 1.00 | 0.96 | 0.85 | 0.96 | 1.08 | 0.45 | 1.20 | 1.20 | 0.83 | 0.80 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU8PEEng.v | 1.00 | 0.98 | 1.00 | 0.98 | 1.00 | 0.98 | 0.84 | 0.50 | 1.00 | 0.84 | 1.08 | 0.94 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | LU32PEEng.v | 1.00 | 0.96 | 1.00 | 1.11 | 1.09 | 1.00 | 1.01 | 0.49 | 0.96 | 1.02 | 1.11 | 0.91 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mcml.v | 1.00 | 0.95 | 1.00 | 0.97 | 0.95 | 0.89 | 1.03 | 0.57 | 0.92 | 1.20 | 0.78 | 0.87 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkDelayWorker32B.v | 1.00 | 0.91 | 1.00 | 0.95 | 1.11 | 1.00 | 0.82 | 0.50 | 0.96 | 0.75 | 0.91 | 0.98 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkPktMerge.v | 1.00 | 0.96 | 1.00 | 0.82 | 1.04 | 1.03 | 0.95 | 0.33 | 0.92 | 1.08 | 0.96 | 0.96 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | mkSMAdapter4B.v | 1.00 | 0.97 | 1.00 | 1.17 | 1.03 | 0.99 | 0.58 | 0.31 | 0.64 | 0.73 | 0.97 | 0.89 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | or1200.v | 1.00 | 0.76 | 1.00 | 1.23 | 0.93 | 0.90 | 1.91 | 0.30 | 0.64 | 3.83 | 0.86 | 0.81 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | raygentop.v | 1.00 | 0.89 | 1.00 | 1.00 | 0.94 | 0.98 | 0.57 | 0.29 | 0.60 | 0.65 | 0.70 | 0.86 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | sha.v | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 | 1.05 | 0.86 | 0.46 | 0.83 | 0.73 | 0.92 | 0.90 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision0.v | 1.00 | 1.00 | 1.00 | 1.00 | 1.08 | 1.00 | 1.05 | 0.46 | 0.97 | 1.71 | 1.00 | 0.91 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision1.v | 1.00 | 0.99 | 1.00 | 0.92 | 1.00 | 1.07 | 1.26 | 0.40 | 0.89 | 1.66 | 0.75 | 0.71 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision2.v | 1.00 | 0.95 | 1.00 | 0.93 | 0.99 | 0.96 | 1.53 | 0.52 | 0.90 | 1.59 | 0.99 | 0.86 | |
| | k6_frac_N10_frac_chain_mem32K_40nm.xml | stereovision3.v | 1.00 | 0.98 | 1.00 | 1.13 | 1.20 | 0.95 | 0.69 | 0.41 | 0.70 | 0.75 | 1.00 | 1.04 | |
| | | GEOMEAN | 1.00 | 0.95 | 1.00 | 1.02 | 1.01 | 0.98 | 0.92 | 0.42 | 0.87 | 1.03 | 0.96 | 0.89 | |
| |
| **QoR Summary:** |
| |
| | | baseline | modified | |
| |-----------------------------|----------|----------| |
| | num_pre_packed_blocks | 1.00 | 1.00 | |
| | num_post_packed_blocks | 1.00 | 0.95 | |
| | device_grid_tiles | 1.00 | 1.00 | |
| | min_chan_width | 1.00 | 1.02 | |
| | crit_path_routed_wirelength | 1.00 | 1.01 | |
| | critical_path_delay | 1.00 | 0.98 | |
| | vtr_flow_elapsed_time | 1.00 | 0.92 | |
| | pack_time | 1.00 | 0.42 | |
| | place_time | 1.00 | 0.87 | |
| | min_chan_width_route_time | 1.00 | 1.03 | |
| | crit_path_route_time | 1.00 | 0.96 | |
| | max_vpr_mem | 1.00 | 0.89 | |
| |
| From the results we can see that our change, on average, achieved a small reduction in the number of logic blocks (0.95) in return for a 2% increase in minimum channel width and 1% increase in routed wirelength. From a run-time persepective the packer is substantially faster (0.42). |
| |
| ### Automated QoR Comparison Script |
| To automate some of the QoR comparison VTR includes a script to compare pares_resutls.txt files and generate a spreadsheet including the ratio and summary tables. |
| |
| For example: |
| ```shell |
| #From the VTR Root |
| $ ./vtr_flow/scripts/qor_compare.py parse_results1.txt parse_results2.txt parse_results3.txt -o comparison.xlsx |
| ``` |
| will produce ratio tables and a summary table for the files parse_results1.txt, parse_results2.txt and parse_results3.txt, where the first file (parse_results1.txt) is assumed to be the baseline used to produce normalized ratios. |
| |
| # Adding Tests |
| |
| Any time you add a feature to VTR you **must** add a test which exercies the feature. |
| This ensures that regression tests will detect if the feature breaks in the future. |
| |
| Consider which regression test suite your test should be added to (see [Running Tests](#running-tests) descriptions). |
| |
| Typically, test which exercise new features should be added to `vtr_reg_strong`. |
| These tests should use small benchmarks to ensure they: |
| * run quickly (so they get run often!), and |
| * are easier to debug. |
| If your test will take more than ~1 minute it should probably go in a longer running regression test (but see first if you can create a smaller testcase first). |
| |
| ## Adding a test to vtr_reg_strong |
| This describes adding a test to `vtr_reg_strong`, but the process is similar for the other regression tests. |
| |
| 1. Create a configuration file |
| |
| First move to the vtr_reg_strong directory: |
| ```shell |
| #From the VTR root directory |
| $ cd vtr_flow/tasks/regression_tests/vtr_reg_strong |
| $ ls |
| qor_geomean.txt strong_flyover_wires strong_pack_and_place |
| strong_analysis_only strong_fpu_hard_block_arch strong_power |
| strong_bounding_box strong_fracturable_luts strong_route_only |
| strong_breadth_first strong_func_formal_flow strong_scale_delay_budgets |
| strong_constant_outputs strong_func_formal_vpr strong_sweep_constant_outputs |
| strong_custom_grid strong_global_routing strong_timing |
| strong_custom_pin_locs strong_manual_annealing strong_titan |
| strong_custom_switch_block strong_mcnc strong_valgrind |
| strong_echo_files strong_minimax_budgets strong_verify_rr_graph |
| strong_fc_abs strong_multiclock task_list.txt |
| strong_fix_pins_pad_file strong_no_timing task_summary |
| strong_fix_pins_random strong_pack |
| ``` |
| Each folder (prefixed with `strong_` in this case) defines a task (sub-test). |
| |
| Let's make a new task named `strong_mytest`. |
| An easy way is to copy an existing configuration file such as `strong_timing/config/config.txt` |
| ```shell |
| $ mkdir -p strong_mytest/config |
| $ cp strong_timing/config/config.txt strong_mytest/config/. |
| ``` |
| You can now edit `strong_mytest/config/config.txt` to customize your test. |
| |
| 2. Generate golden reference results |
| |
| Now we need to test our new test and generate 'golden' reference results. |
| These will be used to compare future runs of our test to detect any changes in behaviour (e.g. bugs). |
| |
| From the VTR root, we move to the `vtr_flow/tasks` directory, and then run our new test: |
| ```shell |
| #From the VTR root |
| $ cd vtr_flow/tasks |
| $ ../scripts/run_vtr_task.pl regression_tests/vtr_reg_strong/strong_mytest |
| |
| regression_tests/vtr_reg_strong/strong_mytest |
| ----------------------------------------- |
| Current time: Jan-25 06:51 PM. Expected runtime of next benchmark: Unknown |
| k6_frac_N10_mem32K_40nm/ch_intrinsics...OK |
| ``` |
| |
| Next we can generate the golden reference results using `parse_vtr_task.pl` with the `-create_golden` option: |
| ```shell |
| $ ../scripts/parse_vtr_task.pl regression_tests/vtr_reg_strong/strong_mytest -create_golden |
| ``` |
| |
| And check that everything matches with `-check_golden`: |
| ```shell |
| $ ../scripts/parse_vtr_task.pl regression_tests/vtr_reg_strong/strong_mytest -check_golden |
| regression_tests/vtr_reg_strong/strong_mytest...[Pass] |
| ``` |
| |
| 3. Add it to the task list |
| |
| We now need to add our new `strong_mytest` task to the task list, so it is run whenever `vtr_reg_strong` is run. |
| We do this by adding the line `regression_tests/vtr_reg_strong/strong_mytest` to the end of `vtr_reg_strong`'s `task_list.txt`: |
| ```shell |
| #From the VTR root directory |
| $ vim vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt |
| # Add a new line 'regression_tests/vtr_reg_strong/strong_mytest' to the end of the file |
| ``` |
| |
| Now, when we run `vtr_reg_strong`: |
| ```shell |
| #From the VTR root directory |
| $ ./run_reg_test.pl vtr_reg_strong |
| #Output trimmed... |
| regression_tests/vtr_reg_strong/strong_mytest |
| ----------------------------------------- |
| #Output trimmed... |
| ``` |
| we see our test is run. |
| |
| 4. Commit the new test |
| |
| Finally you need to commit your test: |
| ```shell |
| #Add the config.txt and golden_results.txt for the test |
| $ git add vtr_flow/tasks/regression_tests/vtr_reg_strong/strong_mytest/ |
| #Add the change to the task_list.txt |
| $ git add vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt |
| #Commit the changes, when pushed the test will automatically be picked up by BuildBot |
| $ git commit |
| ``` |
| |
| # Debugging Aids |
| VTR has support for several additional tools/features to aid debugging. |
| |
| ## Sanitizers |
| VTR can be compiled using *sanitizers* which will detect invalid memory accesses, memory leaks and undefined behaviour (supported by both GCC and LLVM): |
| ```shell |
| #From the VTR root directory |
| $ cmake -D VTR_ENABLE_SANITIZE=ON build |
| $ make |
| ``` |
| |
| ## Assertion Levels |
| VTR supports configurable assertion levels. |
| |
| The default level (`2`) which turns on most assertions which don't cause significant run-time penalties. |
| |
| This level can be increased: |
| ```shell |
| #From the VTR root directory |
| $ cmake -D VTR_ASSERT_LEVEL=3 build |
| $ make |
| ``` |
| this turns on more extensive assertion checking and re-builds VTR. |
| |
| # External Subtrees |
| VTR includes some code which is developed in external repositories, and is integrated into the VTR source tree using [git subtrees](https://www.atlassian.com/blog/git/alternatives-to-git-submodule-git-subtree). |
| |
| To simplify the process of working with subtrees we use the [`dev/external_subtrees.py`](./dev/external_subtrees.py) script. |
| |
| For instance, running `./dev/external_subtrees.py --list` from the VTR root it shows the subtrees: |
| ``` |
| Component: abc Path: abc URL: https://github.com/berkeley-abc/abc.git URL_Ref: master |
| Component: libargparse Path: libs/EXTERNAL/libargparse URL: https://github.com/kmurray/libargparse.git URL_Ref: master |
| Component: libblifparse Path: libs/EXTERNAL/libblifparse URL: https://github.com/kmurray/libblifparse.git URL_Ref: master |
| Component: libsdcparse Path: libs/EXTERNAL/libsdcparse URL: https://github.com/kmurray/libsdcparse.git URL_Ref: master |
| Component: libtatum Path: libs/EXTERNAL/libtatum URL: https://github.com/kmurray/tatum.git URL_Ref: master |
| ``` |
| |
| Code included in VTR by subtrees should *not be modified within the VTR source tree*. |
| Instead changes should be made in the relevant up-stream repository, and then synced into the VTR tree. |
| |
| ### Updating an existing Subtree |
| 1. From the VTR root run: `./dev/external_subtrees.py $SUBTREE_NAME`, where `$SUBTREE_NAME` is the name of an existing subtree. |
| |
| For example to update the `libtatum` subtree: |
| ```shell |
| ./dev/external_subtrees.py --update libtatum |
| ``` |
| |
| ### Adding a new Subtree |
| |
| To add a new external subtree to VTR do the following: |
| |
| 1. Add the subtree specification to `dev/subtree_config.xml`. |
| |
| For example to add a subtree name `libfoo` from the `master` branch of `https://github.com/kmurray/libfoo.git` to `libs/EXTERNAL/libfoo` you would add: |
| ```xml |
| <subtree |
| name="libfoo" |
| internal_path="libs/EXTERNAL/libfoo" |
| external_url="https://github.com/kmurray/libfoo.git" |
| default_external_ref="master"/> |
| ``` |
| within the existing `<subtrees>` tag. |
| |
| Note that the internal_path directory should not already exist. |
| |
| You can confirm it works by running: `dev/external_subtrees.py --list`: |
| ``` |
| Component: abc Path: abc URL: https://github.com/berkeley-abc/abc.git URL_Ref: master |
| Component: libargparse Path: libs/EXTERNAL/libargparse URL: https://github.com/kmurray/libargparse.git URL_Ref: master |
| Component: libblifparse Path: libs/EXTERNAL/libblifparse URL: https://github.com/kmurray/libblifparse.git URL_Ref: master |
| Component: libsdcparse Path: libs/EXTERNAL/libsdcparse URL: https://github.com/kmurray/libsdcparse.git URL_Ref: master |
| Component: libtatum Path: libs/EXTERNAL/libtatum URL: https://github.com/kmurray/tatum.git URL_Ref: master |
| Component: libfoo Path: libs/EXTERNAL/libfoo URL: https://github.com/kmurray/libfoo.git URL_Ref: master |
| ``` |
| which shows libfoo is now recognized. |
| |
| 2. Run `./dev/external_subtrees.py --update $SUBTREE_NAME` to add the subtree. |
| |
| For the `libfoo` example above this would be: |
| ```shell |
| ./dev/external_subtrees.py --update libfoo |
| ``` |
| |
| This will create two commits to the repository. |
| The first will squash all the upstream changes, the second will merge those changes into the current branch. |
| |
| |
| ### Subtree Rational |
| |
| VTR uses subtrees to allow easy tracking of upstream dependencies. |
| |
| Their main advantages included: |
| * Works out-of-the-box: no actions needed post checkout to pull in dependencies (e.g. no `git submodule update --init --recursive`) |
| * Simplified upstream version tracking |
| * Potential for local changes (although in VTR we do not use this to make keeping in sync easier) |
| |
| See [here](https://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/) for a more detailed discussion. |
| |
| # Finding Bugs with Coverity |
| [Coverity Scan](https://scan.coverity.com) is a static code analysis service which can be used to detect bugs. |
| |
| ### Browsing Defects |
| To view defects detected do the following: |
| |
| 1. Get a coverity scan account |
| |
| Contact a project maintainer for an invitation. |
| |
| 2. Browse the existing defects through the coverity web interface |
| |
| |
| ### Submitting a build |
| To submit a build to coverity do the following: |
| |
| 1. [Download](https://scan.coverity.com/download) the coverity build tool |
| |
| 2. Configure VTR to perform a *debug* build. This ensures that all assertions are enabled, without assertions coverity may report bugs that are gaurded against by assertions. We also set VTR asserts to the highest level. |
| |
| ```shell |
| #From the VTR root |
| mkdir -p build |
| cd build |
| CC=gcc CXX=g++ cmake -DCMAKE_BUILD_TYPE=debug -DVTR_ASSERT_LEVEL=3 .. |
| ``` |
| |
| Note that we explicitly asked for gcc and g++, the coverity build tool defaults to these compilers, and may not like the default 'cc' or 'c++' (even if they are linked to gcc/g++). |
| |
| 3. Run the coverity build tool |
| |
| ```shell |
| #From the build directory where we ran cmake |
| cov-build --dir cov-int make -j8 |
| ``` |
| |
| 4. Archive the output directory |
| |
| ```shell |
| tar -czvf vtr_coverity.tar.gz cov-int |
| ``` |
| |
| 5. Submit the archive through the coverity web interface |
| |
| Once the build has been analyzed you can browse the latest results throught the coverity web interface |
| |
| ### No files emitted |
| If you get the following warning from cov-build: |
| |
| [WARNING] No files were emitted. |
| |
| You may need to configure coverity to 'know' about your compiler. For example: |
| |
| ```shell |
| cov-configure --compiler `which gcc-7` |
| ``` |
| |
| On unix-like systems run `scan-build make` from the root VTR directory. |
| to output the html analysis to a specific folder, run `scan-build make -o /some/folder` |
| |
| # Release Procedures |
| |
| ## General Principles |
| |
| We periodically make 'official' VTR releases. |
| While we aim to keep the VTR master branch stable through-out development some users prefer to work of off an official release. |
| Historically this has coincided with the publishing of a paper detailing and carefully evaluating the changes from the previous VTR release. |
| This is particularly helpful for giving academics a named baseline version of VTR to which they can compare which has a known quality. |
| |
| In preparation for a release it may make sense to produce 'release candidates' which when fully tested and evaluated (and after any bug fixes) become the official release. |
| |
| ## Checklist |
| |
| The following outlines the procedure to following when making an official VTR release: |
| |
| * Check the code compiles on the list of supported compilers |
| * Check that all regression tests pass |
| * Update regression test golden results to match the released version |
| * Increment the version number (set in root CMakeLists.txt) |
| * Create a new entry in the CHANGELOG.md for the release, summarizing at a high-level user-facing changes |
| * Create a git annotated tag (e.g. `v8.0.0`) and push it to github |
| |
| |