Scripts for benchmarking and plotting performance data #47

draganaurosgrbic · 2025-07-10T16:49:00Z

run.py - Python script for running a benchmark. Script fixes some parameters when running Tesseract (such as --no-revisit-detector heuristic, seeds --sample-seed and --det-order-seed for generating shots/simulations and different orders of detectors, flag --print-stats for printing output results, etc.) and reads and parses other arguments provided by the user (flag --shots for specifying the number of shots/simulations, --detectors for specifying the number of detector orderings, --pqlimit for specifying the limit of the priority queue, --beam for specifying the beam size, --threads for specifying the number of threads used when running multiple shots/simulations, and finally, the --at-most-two-errors-per-flag for specifying whether to use the heuristic that assumes at most two errors affect any detector)
run_and_collect.py - Python script for running a benchmark and collecting the measurement data. This script utilizes the run.py script to run the selected benchmark and then parses and collects the data for the performance and accuracy of the Tesseract decoder. It contains two major functions:
- parse_output_line, which receives the last line from the output produced by Tesseract (line that contains the total execution time in seconds and the number of low confidence results and the number of errors)
- parse_stim_path receives the file path of the benchmark being executed, parses it and retrieves the values for the code family (color, surface, bicycle, etc.) and values for r, d, and p
plot.py - Python script for plotting the performance data. The script contains the plot function, which receives several arguments:
- data argument that specifies data to be plotted. The data should have the following format: an array of elements, where elements have fields label, before, and after. Field label is used as the label of the bar being plotted, and before and after are used to specify the metric (execution time, accuracy, etc.) before and after two different versions of the experiment (before and after optimization, with and without running the experiment with a specific flag, etc.)
- title for the entire graph
- title for the y-axis, that specifies the name of the metric being plotted (decoding time, accuracy, etc.)
- legend for the graph
- size of the graph
utils.py - helper Python module, which currently only has a create_label function, that receives values for the code family (color, surface, bicycle, etc.), r, p, and the number of shots. It creates a string representing the label for the experiment in the bar graph being plotted

for better data locality Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

…eract-decoder into optimization-cpu

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

…eract-decoder into optimization-cpu

…using '--at-most-two-errors-per-detector' flag

review-notebook-app · 2025-07-10T16:49:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

draganaurosgrbic · 2025-07-10T17:29:05Z

@LalehB you can review the Jupyter notebook using the ReviewNB option

…--at-most-two-errors-per-detector flag (#45) ### Fixing the performance issue (and also implicitly a bug that existed) This PR fixes the performance issue of costly `std::vector` copy operations performed when the `--at-most-two-errors-per-detector` flag is enabled. As discussed in #27, the initial `next_next_blocked_errs = next_blocked_errs` line that existed used to consume a significant decoding time, as each time a new search state was being explored/constructed, this line would make a local copy of blocked errors for the current state being processed, so that changes made on that current state do not affect following states being explored. In #27, I realized these operations of creating local copies of `std::vector` data structures only had to be performed when the `--at-most-two-errors-per-detector` flag was enabled, as in that case, _Tesseract_ was performing additional changes on the vector of blocked errors that had to be reverted in the following iteration of exploring a search state. In this PR, I address these issues also when the `--at-most-two-errors-per-detector` flag is enabled, as it is highly inefficient to copy entire large vectors each time a new search state is explored, only to revert a few changes. I achieved this by storing a special value `2` (instead of `true/1`) for the errors that are blocked due to the `--at-most-two-errors-per-detector` flag and that should be unblocked/reverted in the next search state. Note that this is possible to implement now, as we are now storing boolean elements as integers, rather than simple bits/values that are always either `1` or `0`. The evaluation of this change/optimization could be explored in different dimensions: 1. Before I started working on _Tesseract_, `--at-most-two-errors-per-detector` flag would frequently perform copy operations of `std::vector<bool>`. As discussed in #25, these data structures improve the memory efficiency by packing boolean elements into individual bits, but I replaced them with `std::vector<char>` as this drastically improved the performance/speed, since _Tesseract_ was frequently accessing elements in `std::vector<bool>`, which induced significant overhead, due to the bit-wise operations being performed. 2. After I applied the optimization in #25, we were performing copies of `std::vector<char>`, which required more time, as they stored larger elements. 3. Finally, after I implemented optimization in #34, we would perform copies of `std::vector<DetectorCostTuple>`, where each element is 8 bytes, requiring even more time. Since we were not using the `--at-most-two-errors-per-detector` flag, as it affects the accuracy of the decoder, I focused my optimizations when not enabling this flag. However, the _Tesseract_ algorithm was still left with this code that was frequently performing copies of large vectors when the `--at-most-two-errors-per-detector` flag was enabled, only to revert a few changes. In this PR, I fix this this performance issue when the `--at-most-two-errors-per-detector` flag is enabled, by imposing a smarter strategy: storing a special `2` value for the errors that need to be unblocked/reverted in the following iteration of exploring a search state. Below are graphs where I evaluate the performance improvement I achieved by removing these unnecessary copy operations when the `--at-most-two-errors-per-detector` flag is enabled. Note that I am evaluating this when removing copy operations of `std::vector<DetectorCostTuple>`, as this was the last data representation I implemented before this PR. I also noticed that this PR affects the accuracy of the decoder. The reason why this PR affects/improves the accuracy of the decoder when using the `--at-most-two-errors-per-detector` flag is because the code had a bug. The code below: ``` for (int d : edets[ei]) { next_detectors[d] = !next_detectors[d]; int fired = next_detectors[d] ? 1 : -1; next_num_detectors += fired; for (int oei : d2e[d]) { next_detector_cost_tuples[oei].detectors_count += fired; } if (!next_detectors[d] && config.at_most_two_errors_per_detector) { for (size_t oei : d2e[d]) { next_next_detector_cost_tuples[oei].error_blocked = true; } } } ``` contains the critical loop with the bug. This loop updates the number of fired detectors for each error in the `next_detector_cost_tuples` and blocks errors in the `next_next_detector_cost_tuples`. However, when calling the `get_detcost` function, the `next_next_detector_cost_tuples` is provided as the argument. This inconsistency occurred only when the flag `--at-most-two-errors-per-detector` is enabled, as in that case, `next_next_detector_cost_tuples` was being constructed and passed to `get_detcost`, but the modifications on the fired detectors were performed on the `next_detector_cost_tuples`. This explains having more low confidence results and also 3 errors in a surface code benchmark (r=11, p=0.002, 500 shots). In this PR, this loop with the bug is replaced with: ``` for (int d : edets[ei]) { next_detectors[d] = !next_detectors[d]; int fired = next_detectors[d] ? 1 : -1; next_num_detectors += fired; for (int oei : d2e[d]) { next_detector_cost_tuples[oei].detectors_count += fired; } if (!next_detectors[d] && config.at_most_two_errors_per_detector) { for (size_t oei : d2e[d]) { next_detector_cost_tuples[oei].error_blocked = next_detector_cost_tuples[oei].error_blocked == 1 ? 1 : 2; } } } ``` As explained earlier, this PR entirely removes the `next_next_detector_cost_tuples` and replaces it with the smarter strategy of reverting changes made on blocked errors, explained earlier. Since I completely removed this array, I also fixed the bug, as now changes on the number of fired detectors and blocked errors are always performed on the same `next_detector_cost_tuples` array. **Note: I evaluated the change/improvement in accuracy only by comparing the number of low confidence results. I also measured the number of errors when executing shots/simulations, but for the benchmarks below I tested with, there were no errors before and after I implemented this PR. The only exception was the surface code benchmark (r=11, p=0.002, 500 shots). For this benchmark, I encountered 3 errors (from all 500 shots) before I implemented this PR and 0 after I implemented this PR.** <img width="1778" height="870" alt="Screenshot 2025-07-18 7 40 19 PM" src="https://github.com/user-attachments/assets/a1f54d20-ee00-43bf-8c28-f29aaf487d80" /> <img width="1778" height="876" alt="Screenshot 2025-07-18 7 40 39 PM" src="https://github.com/user-attachments/assets/87279c31-abae-4860-9bab-749eb02631ee" /> <img width="1778" height="877" alt="Screenshot 2025-07-18 7 41 01 PM" src="https://github.com/user-attachments/assets/07f820fa-c2de-4029-b5c4-52c0e035dc71" /> <img width="1778" height="875" alt="Screenshot 2025-07-18 7 41 21 PM" src="https://github.com/user-attachments/assets/8a4275ed-a787-4733-9e2c-8c609406edbc" /> ### Analyzing the impact of this flag on the performance and accuracy **Now that we have this flag fixed and optimized (as the version without using the flag), we can analyze its impact on the performance and accuracy of the decoder.** I first analyzed the performance and accuracy impact of this flag using the same benchmarks I used to test the performance/bug fix I implemented in this PR. I noticed that for these benchmarks, the flag provides somewhat better accuracy, but lower performance. Below are graphs that compare the accuracy and performance with and without using the `--at-most-two-errors-per-detector` flag. <img width="1778" height="985" alt="Screenshot 2025-07-18 7 41 52 PM" src="https://github.com/user-attachments/assets/d4e4f5f3-75e8-46e1-9fdf-51537316d652" /> <img width="1778" height="874" alt="Screenshot 2025-07-18 7 42 10 PM" src="https://github.com/user-attachments/assets/b31b28cd-d9a9-4a2a-bae2-88f78a16a5e0" /> <img width="1778" height="986" alt="Screenshot 2025-07-18 7 42 27 PM" src="https://github.com/user-attachments/assets/c6ea54d6-0738-44b3-924f-4632291e41e1" /> <img width="1778" height="883" alt="Screenshot 2025-07-18 7 42 44 PM" src="https://github.com/user-attachments/assets/e5f64154-da58-406d-8e47-2be2a0004a37" /> ### More data on the performance and accuracy impact of the flag I performed additional experiments/benchmarks to collect more comprehensive data on the impact of this flag on the performance and accuracy of the _Tesseract_ decoder. Below are plots for various groups of codes. It confirms that for (most of) benchmarks it provides somewhat better accuracy, but lower performance. <img width="1763" height="980" alt="Screenshot 2025-07-18 1 50 52 PM" src="https://github.com/user-attachments/assets/3ac277dd-4e9d-418c-956d-dc331ef12019" /> <img width="1763" height="981" alt="Screenshot 2025-07-18 1 52 49 PM" src="https://github.com/user-attachments/assets/9c7e50ef-7bb2-4805-8e8c-d1df4152cc10" /> <img width="1762" height="981" alt="Screenshot 2025-07-18 1 55 53 PM" src="https://github.com/user-attachments/assets/1803cccf-4f25-4b9a-bb2a-3818412f60de" /> <img width="1762" height="980" alt="Screenshot 2025-07-18 1 57 34 PM" src="https://github.com/user-attachments/assets/b5645353-8168-4b39-9473-4c3ed425083c" /> <img width="1748" height="981" alt="Screenshot 2025-07-18 2 02 48 PM" src="https://github.com/user-attachments/assets/1084b196-365a-4e3b-a65d-bacd19929760" /> <img width="1748" height="981" alt="Screenshot 2025-07-18 2 04 45 PM" src="https://github.com/user-attachments/assets/c6cbcf78-26ab-48e2-a9a3-2ff1faf3c5dc" /> <img width="1756" height="989" alt="Screenshot 2025-07-18 2 08 19 PM" src="https://github.com/user-attachments/assets/e5f98227-b5e0-4eba-885f-571908d183a0" /> <img width="1746" height="988" alt="Screenshot 2025-07-18 3 15 07 PM" src="https://github.com/user-attachments/assets/81c6133e-e34d-403d-85fc-320042311120" /> **The results show that the increase in decoding speed can range from around 0% to over 40%. In very rare cases, this flag provides (very low) performance improvement. The accuracy improvement ranges from 0% to over 30%, indicating that this flag can have a significant impact on the higher accuracy.** ### Major contributions of the PR: - Removes the performance degradation caused by optimizations I implemented when targeting configurations that do not use this flag, but significantly improved the decoding time without using the flag - Completely removed inefficient/redundant `std::vector` copy operations that were propagated due to the `next_next_blocked_errs = next_blocked_errs` line that existed before (mentioned in PR #27) - Fixed the performance issue/bug that existed when using the `--at-most-two-errors-per-detector` flag, where large vectors were frequently copied in each decoding iteration only to revert a few changes (it is important to note that this performance issue escalated because of the changes made in the data representation, which were necessary to implement previous optimization strategies) - Extensive experiments/benchmarks performed to evaluate the impact of the performance issue/bug fix - Extensive experiments/benchmarks performed to evaluate the impact of the flag itself on the performance and accuracy of the decoder ### Does it provide better performance on any benchmark now? I also tested running a benchmark our team looked at the last meeting where we saw that using the `--at-most-two-errors-per-detector` flag did provide better performance. I specifically tested running this benchmark: `bazel build src:all && time ./bazel-bin/src/tesseract --pqlimit 200000 --beam 5 --num-det-orders 20 --sample-num-shots 20 --det-order-seed 13267562 --circuit testdata/colorcodes/r\=9\,d\=9\,p\=0.002\,noise\=si1000\,c\=superdense_color_code_X\,q\=121\,gates\=cz.stim --sample-seed 717347 --threads 1 --print-stats` with and without the `--at-most-two-errors-per-detector` flag. However, the execution time I had without using the flag was 69.01 seconds, and with using the flag 74.23 seconds. There were no errors or low confidence results in each run. I think the benchmark we looked at during our last meeting used the installation of _Tesseract_ before my optimization from #34. If so, this shows that my optimizations had higher impact when not using this flag, and also shows that the performance improvement I achieved outweighs this flag's initial speedup. **Conclusion: I am very confident that the current version of the _Tesseract_ algorithm is faster without using this flag due to the optimizations I implemented in the `get_detcost` function. When `--at-most-two-errors-per-detector` flag is enabled, more errors are blocked, preventing them to influcence detectors' costs, and therefore the `get_detcost` function itself. I invested a lot of time accelerating the `get_detcost` function, so other speedups this flag initially achieved did not outweigh the impact I achieved in #34.** PR #47 contains the code/scripts I used to benchmark and compare color, surface, and bicycle codes with and without using the `--at-most-two-errors-per-detector` flag. --------- Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com> Co-authored-by: noajshu <shutty@google.com> Co-authored-by: LaLeh <lalehbeni@google.com>

### Hashing Syndrome Patterns with `boost::dynamic_bitset` In this PR, I address a key performance bottleneck: the hashing of fired detector patterns (syndrome patterns). I introduce the use of `boost::dynamic_bitset` from the Boost library, a data structure that combines the memory-saving bit-packing feature of `std::vector<bool>` with highly optimized bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>`. Crucially, `boost::dynamic_bitset` also provides highly optimized, built-in functions for efficiently hashing sequences of boolean elements. --- ### Initial Optimization: `std::vector<bool>` to `std::vector<char>` The initial _Tesseract_ implementation, as documented in #25, utilized `std::vector<bool>` to store patterns of fired detectors and predicates that block specific errors from being added to the current error hypothesis. While `std::vector<bool>` optimizes memory usage by packing elements into individual bits, accessing and modifying its elements is highly inefficient due to its reliance on proxy objects that perform costly bit-wise operations (shifting, masking). Given _Tesseract_'s frequent access and modification of these elements, this caused significant performance overheads. In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`. This change made boolean elements addressable bytes, enabling efficient and direct byte-level access. Although this increased memory footprint (as each boolean was stored as a full byte), it delivered substantial performance gains by eliminating `std::vector<bool>`'s proxy objects and their associated overheads for element access and modification. Speedups achieved with this initial optimization were significant: * For Color Codes, speedups reached 17.2%-32.3% * For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3% * For Surface Codes, speedups reached 33.4%-42.5% * For Transversal CNOT Protocols, speedups reached 12.2%-32.4% These significant performance gains highlight the importance of choosing appropriate data structures for boolean sequences, especially in performance-sensitive applications like _Tesseract_. The remarkable 42.5% speedup achieved in Surface Codes with this initial switch underscores the substantial overhead caused by unsuitable data structures. The performance gain from removing `std::vector<bool>`'s proxy objects and their inefficient operations far outweighed any overhead from increased memory consumption. --- ### Current Bottleneck: `std::vector<char>` and Hashing Following the optimizations in #25, _Tesseract_ continued to use `std::vector<char>` for storing and managing patterns of fired detectors and predicates that block errors. Subsequently, PR #34 replaced and merged vectors of blocked errors into the `DetectorCostTuple` structure, which efficiently stores `error_blocked` and `detectors_count` as `uint32_t` fields (reasons explained in #34). These changes left vectors of fired detectors as the sole remaining `std::vector<char>` data structure in this context. After implementing and evaluating optimizations in #25, #27, #34, and #45, profiling _Tesseract_ to analyze remaining bottlenecks revealed that, aside from the `get_detcost` function, a notable bottleneck emerged: `VectorCharHash` (originally `VectorBoolHash`). This function is responsible for hashing patterns of fired detectors to prevent re-exploring previously visited syndrome states. The implementation of `VectorCharHash` involved iterating through each element, byte by byte, and accumulating the hash. Even though this function saw significant speedups with the initial switch from `std::vector<bool>` to `std::vector<char>`, hashing patterns of fired detectors still consumed considerable time. Post-optimization profiling (after #25, #27, #34, and #45) revealed that this hashing function consumed approximately 25% of decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10% in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost` remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore, I decided to explore opportunities to further optimize this function and enhance the decoding speed. --- ### Solution: Introducing `boost::dynamic_bitset` This PR addresses the performance bottleneck of hashing fired detector patterns and mitigates the increased memory footprint from the initial switch to `std::vector<char>` by introducing the `boost::dynamic_bitset` data structure. The C++ standard library's `std::bitset` offers an ideal conceptual solution: memory-efficient bit-packed storage (like `std::vector<bool>`) combined with highly efficient access and modification operations (like `std::vector<char>`). This data structure achieves efficient access and modification by employing highly optimized bit-wise operations, thereby reducing performance overhead stemming from proxy objects in `std::vector<bool>`. However, `std::bitset` requires a static size (determined at compile-time), rendering it unsuitable for _Tesseract_'s dynamically sized syndrome patterns. The Boost library's `boost::dynamic_bitset` provides the perfect solution by offering dynamic-sized bit arrays whose dimensions can be determined at runtime. This data structure brilliantly combines the memory efficiency of `std::vector<bool>` (by packing elements into individual bits) with the performance benefits of direct element access and modification, similar to `std::vector<char>`. This is achieved by internally storing bits within a contiguous array of fundamental integer types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying elements using highly optimized bit-wise operations, thus avoiding the overheads of `std::vector<bool>`'s proxy objects and costly bit-wise operations. Furthermore, `boost::dynamic_bitset` offers highly optimized, built-in hashing functions, replacing our custom, less efficient byte-by-byte hashing and resulting in a cleaner, faster implementation. --- ### Performance Evaluation: Individual Impact of Optimization I performed two types of experiments to evaluate the achieved performance gains. First, I conducted extensive benchmarks across various code families and configurations to evaluate the individual performance gains achieved by this specific optimization. Speedups achieved include: * For Surface Codes: 8.0%-24.7% * For Transversal CNOT Protocols: 12.1%-26.8% * For Color Codes: 3.6%-7.0% * For Bivariate-Bicycle Codes: 0.5%-4.8% These results highlight the highest impact in Surface Codes and Transversal CNOT Protocols, which aligns with the initial profiling data that showcased these code families were spending more time in the original `VectorCharHash` function. --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a" /> <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925" /> --- #### Speedups in Color Codes <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0" /> <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047" /> <img width="1990" height="989" alt="img7" src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945" /> --- ### Performance Evaluation: Cumulative Speedup Following the evaluation of individual performance gains, I analyzed the cumulative effect of the optimizations implemented across PRs #25, #27, #34, and #45. The cumulative speedups achieved are: * For Color Codes: 40.7%-54.8% * For Bivariate-Bicycle Codes: 41.5%-80.3% * For Surface Codes: 50.0%-62.4% * For Transversal CNOT Protocols: 57.8%-63.6% These results demonstrate that my optimizations achieved over 2x speedup in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT Protocols, and over 5x speedup in Bivariate-Bicycle Codes. --- #### Speedups in Color Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69" /> <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd" /> <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452" /> --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1" /> --- ### Conclusion These results demonstrate that the `boost::dynamic_bitset` optimization significantly impacts code families where the original hashing function (`VectorCharHash`) was a primary bottleneck (Surface Codes and Transversal CNOT Protocols). The substantial speedups achieved in these code families validate that `boost::dynamic_bitset` provides demonstrably more efficient hashing and bit-wise operations. For code families where hashing was less of a bottleneck (Color Codes and Bivariate-Bicycle Codes), the speedups were modest, reinforcing that `std::vector<char>` can remain highly efficient even with increased memory usage when bit packing is not the primary performance concern. Crucially, this optimization delivers comparable or superior performance to `std::vector<char>` while simultaneously reducing memory footprint, providing additional speedups where hashing performance is critical. --- ### Key Contributions * Identified the hashing of syndrome patterns as the primary remaining bottleneck in Surface Codes and Transversal CNOT Protocols, post prior optimizations (#25, #27, #34, #45). * Adopted `boost::dynamic_bitset` as a superior data structure, combining `std::vector<bool>`'s memory efficiency with high-performance bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>` * Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing syndrome patterns. * Performed extensive benchmarking to evaluate both the individual impact of this optimization and its cumulative effect with prior PRs. * Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes). PR #47 contains the scripts I used for benchmarking and plotting the results. --------- Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com> Co-authored-by: noajshu <shutty@google.com> Co-authored-by: LaLeh <lalehbeni@google.com>

draganaurosgrbic and others added 17 commits June 14, 2025 14:52

Packing blocked errors and detection counts into a single array/struct

0a24685

for better data locality Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Remove/refactor redundant code

779e7ac

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Minor changes

eff2a15

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Merge remote-tracking branch 'origin/main' into optimization-cpu

4003520

Control the number of nodes inserted into priority queue

144affe

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

fix detector beam node skipping implementation

02c1066

beam search pruning based on min cost per detector count

6f4b5dd

Merge branch 'optimization-cpu' of https://github.com/quantumlib/tess…

e0d6f2f

…eract-decoder into optimization-cpu

Merge remote-tracking branch 'origin/main' into optimization-cpu

b16bc51

Format src/tesseract.cc file

cfd157a

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Merge branch 'main' into optimization-cpu

97b78eb

Remove unnecessary code

59b73f1

Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>

Merge remote-tracking branch 'origin/main' into optimization-cpu

bb2ac2b

Merge branch 'optimization-cpu' of https://github.com/quantumlib/tess…

d6a5659

…eract-decoder into optimization-cpu

Scripts for executing and plotting benchmarks

29aa98a

Remove scripts for benchmarking and plotting

d97840d

Scripts for benchmarking and plotting quantum codes with and without …

630348b

…using '--at-most-two-errors-per-detector' flag

draganaurosgrbic mentioned this pull request Jul 10, 2025

Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled #45

Merged

draganaurosgrbic requested a review from LalehB July 10, 2025 17:27

draganaurosgrbic changed the title ~~Scripts for benchmarking and plotting quantum codes with and without using the '--at-most-two-errors-per-detector' flag~~ Scripts for benchmarking and plotting quantum codes Jul 10, 2025

draganaurosgrbic changed the title ~~Scripts for benchmarking and plotting quantum codes~~ Scripts for benchmarking and plotting performance data Jul 10, 2025

draganaurosgrbic mentioned this pull request Jul 20, 2025

Caching and Trimming Errors per Detector in Color and BB Codes #46

Open

This was referenced Jul 26, 2025

Hashing fired detectors with boost::dynamic_bitset #57

Merged

Replace std::vector<bool> with std::vector<char> for faster computations #25

Merged

Removing unnecessary std::vector copy operations #27

Merged

Accelerating 'get_detcost' function #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scripts for benchmarking and plotting performance data #47

Scripts for benchmarking and plotting performance data #47

Uh oh!

draganaurosgrbic commented Jul 10, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Jul 10, 2025

Uh oh!

draganaurosgrbic commented Jul 10, 2025

Uh oh!

Uh oh!

Scripts for benchmarking and plotting performance data #47

Are you sure you want to change the base?

Scripts for benchmarking and plotting performance data #47

Uh oh!

Conversation

draganaurosgrbic commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jul 10, 2025

Uh oh!

draganaurosgrbic commented Jul 10, 2025

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 10, 2025 •

edited

Loading