Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled #45

draganaurosgrbic · 2025-07-07T20:05:29Z

Fixing a Performance Bottleneck and a Related Bug When Using the `--at-most-two-errors-per-detector` Flag

This PR addresses a critical performance issue and a related bug that existed when the --at-most-two-errors-per-detector flag was enabled. The core of the problem was the highly inefficient copying of large std::vector data structures, which was necessary for state management (mentioned in #27) but became prohibitively expensive due to previous optimizations. This PR completely removes these costly copy operations and, in doing so, implicitly fixes a bug that was degrading decoder accuracy.

The Performance Issue (and also a Bug)

This PR resolves a performance degradation that was exclusively present when using the --at-most-two-errors-per-detector flag. While the code path for these redundant copy operations existed from the beginning, its performance impact was initially masked by other bottlenecks. After significant speedups were achieved in other parts of the code through optimizations in PRs #25, #27, and #34, this flag-specific degradation became a major and prominent bottleneck. The cost of the copy operations escalated as the size of the data structures increased with these PRs. The current work focuses on resolving this last remaining degradation, ensuring that even when this flag is enabled, we achieve a high level of speedup consistent with other scenarios.

The Technical Solution

To solve this, I replaced the expensive copy-and-revert strategy with a more intelligent state management method. Instead of making a copy, I now store a special value (2) in the error_blocked field of the DetectorCostTuple struct. This special value indicates that an error was blocked specifically due to the --at-most-two-errors-per-detector flag and should be unblocked (or "reverted") in the next search state. This is made possible because the error_blocked field is a uint32_t, allowing it to hold values beyond a simple true/false (1 or 0).

This new approach completely removes the need for the next_next_detector_cost_tuples vector. Instead, all changes are now made on a single vector (next_detector_cost_tuples), which is much more efficient.

The Bug and its Fix

The old code contained a bug that was degrading the decoder's accuracy. The following loop:

for (int d : edets[ei]) {
  next_detectors[d] = !next_detectors[d];
  int fired = next_detectors[d] ? 1 : -1;
  next_num_detectors += fired;
  for (int oei : d2e[d]) {
    next_detector_cost_tuples[oei].detectors_count += fired;
  }

  if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
    for (size_t oei : d2e[d]) {
      next_next_detector_cost_tuples[oei].error_blocked = true;
    }
  }
}

had an inconsistency when the --at-most-two-errors-per-detector flag was enabled. The number of fired detectors was being updated on next_detector_cost_tuples, but the blocking of errors was being performed on next_next_detector_cost_tuples. When the get_detcost function was called, it would use next_next_detector_cost_tuples, which did not have the correct fired detector counts. This inconsistency explains why some benchmarks produced low confidence results and errors (e.g., 3 errors in a Surface Code benchmark).

The new code, which replaces the above loop, is as follows:

for (int d : edets[ei]) {
  next_detectors[d] = !next_detectors[d];
  int fired = next_detectors[d] ? 1 : -1;
  next_num_detectors += fired;
  for (int oei : d2e[d]) {
    next_detector_cost_tuples[oei].detectors_count += fired;
  }

  if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
    for (size_t oei : d2e[d]) {
      next_detector_cost_tuples[oei].error_blocked = next_detector_cost_tuples[oei].error_blocked == 1 ? 1 : 2;
    }
  }
}

By completely removing the next_next_detector_cost_tuples vector and performing all modifications on a single array, this PR implicitly fixes the bug and ensures consistency between the fired detector counts and the blocked error flags.

Benchmarking the Performance and Bug Fix

The results below clearly show a huge amount of time was lost on these expensive copy operations when using the --at-most-two-errors-per-detector flag. This was especially apparent in the Surface Code benchmark (r=11, p=0.002, 500 shots). This performance degradation escalated as the size of the data structures increased with previous optimizations.

The bug fix also improved accuracy. For the Surface Code benchmark (r=11, p=0.002, 500 shots) that previously had 3 errors, the new code produced 0 errors.

Analyzing the Impact of the Flag Itself

With the performance and correctness issues fixed, I performed additional experiments to analyze the intrinsic impact of the --at-most-two-errors-per-detector flag. The goal was to understand its effect on performance and accuracy now that the underlying implementation is optimized.

The results from these benchmarks show a consistent and strange behavior: the flag provides somewhat better accuracy but lower performance. This is the opposite of a typical heuristic, which is meant to trade some accuracy for a performance gain.

To collect more comprehensive data, I performed additional experiments on various groups of code families. These experiments confirmed the initial findings: the flag provides better accuracy at the cost of performance, with a decoding slowdown ranging from 0.2% to 69%.

Conclusion: This Heuristic is not an Optimization

I performed a final benchmark on a specific case that previously showed a performance benefit when using the flag. The command used for this benchmark was:

bazel build src:all && time ./bazel-bin/src/tesseract --pqlimit 200000 --beam 5 --num-det-orders 20 --sample-num-shots 20 --det-order-seed 13267562 --circuit testdata/colorcodes/r=9,d=9,p=0.002,noise=si1000,c=superdense_color_code_X,q=121,gates=cz.stim --sample-seed 717347 --threads 1 --print-stats

Before the optimizations in PR #34, the execution time without the flag was 75.91 seconds, while using the flag was 74.23 seconds. However, after applying the get_detcost optimizations from PR #34, the execution time without the flag was 69.01 seconds, while with the flag it was 72.98 seconds.

This demonstrates that the speedup from the optimizations in PR #34 (a gain of ~6.9 seconds) is far more significant than the speedup this heuristic flag initially provided (a gain of ~1.68 seconds). The flag's intended purpose of improving performance by pruning the search space is now outweighed by the efficiency gains in the core decoding function. The optimization I implemented in PR #34 achieved a better speedup than the one this heuristic flag might initially have achieved, making the flag-specific degradation an issue that needed to be addressed. It is clear that my optimization work now provides a high speedup in all scenarios.

My conclusion is that the current version of the Tesseract algorithm is faster without using this flag. The next logical step may be to remove the flag entirely, but I am leaving that decision to the original implementers of the flag.

Key Contributions

Performance Fix: Removed a performance degradation that was left in the code logic for the --at-most-two-errors-per-detector flag. This logic performed redundant copy operations on large vectors instead of a smarter method of reverting changes. This fixes the last remaining performance degradation that was specific to using the heuristic flag.
Bug Fix: Identified and fixed a bug that existed when the --at-most-two-errors-per-detector flag was enabled, which was causing accuracy issues and low confidence results.
Intelligent Solution: Implemented a smarter, copy-free state management strategy using a special 2 value in the DetectorCostTuple structure, made possible by previous data representation changes.
Extensive Benchmarking: Conducted extensive experiments to evaluate the performance and bug fixes and to analyze the intrinsic behavior of the flag itself.
Informed Future Development: Provided comprehensive data that clearly shows the flag no longer provides a performance benefit and that its behavior is contrary to its purpose as a heuristic. This provides a strong basis for the team to decide on its future.

for better data locality Signed-off-by: Dragana Grbic <[email protected]>

Signed-off-by: Dragana Grbic <[email protected]>

…eract-decoder into optimization-cpu

Signed-off-by: Dragana Grbic <[email protected]>

LalehB

Thanks, Dragana!
Could you include some benchmarking results for both performance and accuracy? It would be good to quantify the impact of this change.

src/tesseract.cc

src/tesseract.h

src/tesseract.cc

Signed-off-by: Dragana Grbic <[email protected]>

…eract-decoder into optimization-cpu

draganaurosgrbic · 2025-07-08T23:19:59Z

@LalehB Please see the updated description of the PR. It should answer all of your questions.

LalehB · 2025-07-09T00:19:56Z

@LalehB see the updated description of the PR. It should answer all of your questions.

Thank you, @draganaurosgrbic, for addressing the comments. I noticed that you're reporting the number of low-confidence events to quantify accuracy — could you also include the error count for completeness?
Additionally, I was wondering how much enabling the --at-most-two-errors-per-detector flag impacts accuracy in these benchmarks. Have you observed any noticeable differences with or without this flag?

noajshu · 2025-07-09T04:14:09Z

This is a pretty incredible speedup when the flag is enabled!

Although the flag does effect accuracy, I wonder if we couldn't use a similar technique but possibly allow more than 2. Like --at-most-4-errors-per-detector since that's going to have a much smaller impact on accuracy.

I also wonder just how much accuracy is lost for e.g. the large color codes.

noajshu · 2025-07-09T04:16:26Z

I noticed that you're reporting the number of low-confidence events to quantify accuracy — could you also include the error count for completeness?

+1 could you share the error rates as well please?

draganaurosgrbic · 2025-07-09T05:08:42Z

@LalehB Please check the PR again, I added the plots that compare the speed and accuracy with and without using the --at-most-two-errors-per-detector flag. For the benchmarks I analyzed in this PR, it provides somewhat better accuracy, but lower performance.

draganaurosgrbic · 2025-07-09T05:11:20Z

@LalehB @noajshu For the benchmarks I analyzed in this PR, the only error rates I encountered were in the Surface Code benchmark (r=11, p=0.002, 500 shots). I encountered only 3 errors before I implemented this PR and 0 after I implemented this PR. I wrote that in the PR description (it is bolded).

draganaurosgrbic · 2025-07-10T06:43:25Z

@LalehB Please see the updated description of the PR, it explains the change/improvement in the accuracy of the decoder. There was a bug before this PR that was occurring only when using this flag. This PR also fixes that bug.

review-notebook-app · 2025-07-10T09:52:14Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

LalehB · 2025-07-15T22:36:36Z

Thank you @draganaurosgrbic for the update to this PR, I am very confused by this data chart that you included:

the blue bar which is labeled as "At most two errors per detectors" is showing higher decoding time?? does this mean this flag makes the decoding slower??

draganaurosgrbic · 2025-07-15T23:32:35Z

@LalehB Yes, when I performed these benchmarks, I was getting lower performance when using this flag, but somewhat better accuracy (check the graphs for the accuracy/low confidence count).

draganaurosgrbic · 2025-07-15T23:37:39Z

@LalehB Please also note the two very important contributions of this PR: completely fixes the bug of copying large vectors and fixes the bug that we didn't know existed before (inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples when this flag is enabled).

LalehB · 2025-07-16T00:47:09Z

@LalehB Yes, when I performed these experiments/benchmarks, I was getting slower decoding times with this flag, but somewhat higher accuracy (check the graphs for the accuracy/low confidence count)

@draganaurosgrbic I thought the accuracy improvement was because of the bug fix of "(inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples)", however would you please explain to me how the flag --at-most-two-errors-per-detector results in worse performance since technically it suppose to do less work? if it makes the performance worse shouldn't we just remove the flag then (since it is introduced to improve performance) and just fix the bug?

draganaurosgrbic · 2025-07-16T00:49:58Z

@LalehB Please check the PR description again, there are two accuracy changes/improvements I analyzed. The first one is before and after I fixed the performance issue/bug (inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples). Another analysis is the comparison of accuracy after I fixed this bug: comparison of accuracy when using --at-most-two-errors-per-detector flag with the version of not using the flag at all. The PR has plots/graphs for both of these.

draganaurosgrbic · 2025-07-16T01:00:14Z

@LalehB The first analysis/comparison of accuracy is the comparison before and after I fixed the bug. I realized there was a bug because when I completely removed the redundant std::vector copy operations, I compared the accuracy with the version before I implemented this change. Then I noticed there is a difference in the accuracy. I realized that happened because there was this bug we did not notice before and that I fixed when I removed redundant std::vector copy operations. After I fixed the bug, I performed another comparison of the accuracy with and without using the flag. I performed this additional comparison to analyze the actual impact of this flag on the performance and accuracy. Hope this makes sense.

LalehB · 2025-07-16T01:07:49Z

@draganaurosgrbic I think it would be good to touch base on some of project goals here:

If there's a bug, it's always worth fixing it.
It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
When we introduce a heuristic flag, we should clarify what it's supposed to achieve. For example:
Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.
In general, if a heuristic reduces computation, we should expect it to improve performance. If it doesn't, we need to understand why. (I think here the the impact of bug fix and the heuristic flag --at-most-two-errors-per-detector are mixed up)

draganaurosgrbic · 2025-07-16T01:24:25Z

@LalehB Thank you for your feedback. Maybe my PR description is not clear enough. Please read this:

I started this PR with the plan to fix the performance issue I explained. I did not know there was a bug, the plan was not to fix the bug, I only realized there was a bug when I fixed the performance issue.
When I discovered the performance issue and fixed it, I realized that by fixing that performance issue I also fixed a bug we did not notice before. I realized that by comparing the accuracy before and after fixing the performance issue (and also, implicitly fixing the bug). So I wanted to make sure I documented that, as it is important to understand everything that my code changes are affecting. That's why I am also explaining the logic behind the bug fix.
After I optimized and fixed this flag, I wanted to understand its impact on the performance and accuracy since we did not mention this flag at our meetings at all.
I did not introduce or implement this flag, this flag existed before. I only discovered a performance issue when using this flag, fixed that performance issue and when fixing that performance issue also fixed the bug that existed and documented everything.
The only thing that might be missing in this PR is why this flag does not provide better performance, but only accuracy (on these specific benchmarks). I also wanted to make sure I did not corrupt Tesseract's performance during my internship, so I tested and benchmarked when using this flag and not using it before I made any changes to Tesseract (before I started my internship). This flag had worse performance, but better accuracy on these benchmarks even before I started my internship. Maybe we could test on other benchmarks? Does the rest of the team have experiments they performed when they initially implemented this flag where they noticed better performance?

I think here the the impact of bug fix and the heuristic flag --at-most-two-errors-per-detector are mixed up

No, this PR has two logical parts:

Fixing the performance issue (which also fixed the bug) -> separate accuracy comparison -> accuracy comparison before and after the fix -> change in the accuracy is the result of fixing the bug
After we actually fixed the --at-most-two-errors-per-detector flag, we want to understand its impact on the performance and accuracy. The results I have shown this flag does not provide better performance, but only better accuracy on these benchmarks -> separate accuracy comparison -> accuracy comparison after the previous bug is fixed -> accuracy comparison with and without using the flag -> this flag provides better accuracy on these benchmarks

draganaurosgrbic · 2025-07-16T01:32:38Z

It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.

@LalehB Please check the PR description again, these graphs show the accuracy improvement coming from the performance fix (and also bug fix, implicitly). They are already included in the PR description.

draganaurosgrbic · 2025-07-16T01:33:28Z

It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.

@LalehB Please check the PR description again, these graphs show the accuracy improvement coming from the heuristic itself. They are already included in the PR description.

draganaurosgrbic · 2025-07-16T01:59:00Z

@LalehB Hope this makes sense now. All of these graphs are already included in the PR description. I thought my logical flow of all of these things are clear from the description. I changed the workflow of the description, it should be more clear now.

draganaurosgrbic · 2025-07-19T03:10:06Z

@LalehB @noajshu The PR description is updated, including more comprehensive data for analyzing the impact of the flag on the performance and accuracy, summary list that includes major contributions, as well as the results for the benchmark we looked at during our last meeting. I have also updated the graphs to look more clear. If you run into any issues, please let me know.

LalehB

LGTM!

src/tesseract.cc

Co-authored-by: LaLeh <[email protected]>

### Hashing Syndrome Patterns with `boost::dynamic_bitset` In this PR, I address a key performance bottleneck: the hashing of fired detector patterns (syndrome patterns). I introduce the use of `boost::dynamic_bitset` from the Boost library, a data structure that combines the memory-saving bit-packing feature of `std::vector<bool>` with highly optimized bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>`. Crucially, `boost::dynamic_bitset` also provides highly optimized, built-in functions for efficiently hashing sequences of boolean elements. --- ### Initial Optimization: `std::vector<bool>` to `std::vector<char>` The initial _Tesseract_ implementation, as documented in #25, utilized `std::vector<bool>` to store patterns of fired detectors and predicates that block specific errors from being added to the current error hypothesis. While `std::vector<bool>` optimizes memory usage by packing elements into individual bits, accessing and modifying its elements is highly inefficient due to its reliance on proxy objects that perform costly bit-wise operations (shifting, masking). Given _Tesseract_'s frequent access and modification of these elements, this caused significant performance overheads. In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`. This change made boolean elements addressable bytes, enabling efficient and direct byte-level access. Although this increased memory footprint (as each boolean was stored as a full byte), it delivered substantial performance gains by eliminating `std::vector<bool>`'s proxy objects and their associated overheads for element access and modification. Speedups achieved with this initial optimization were significant: * For Color Codes, speedups reached 17.2%-32.3% * For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3% * For Surface Codes, speedups reached 33.4%-42.5% * For Transversal CNOT Protocols, speedups reached 12.2%-32.4% These significant performance gains highlight the importance of choosing appropriate data structures for boolean sequences, especially in performance-sensitive applications like _Tesseract_. The remarkable 42.5% speedup achieved in Surface Codes with this initial switch underscores the substantial overhead caused by unsuitable data structures. The performance gain from removing `std::vector<bool>`'s proxy objects and their inefficient operations far outweighed any overhead from increased memory consumption. --- ### Current Bottleneck: `std::vector<char>` and Hashing Following the optimizations in #25, _Tesseract_ continued to use `std::vector<char>` for storing and managing patterns of fired detectors and predicates that block errors. Subsequently, PR #34 replaced and merged vectors of blocked errors into the `DetectorCostTuple` structure, which efficiently stores `error_blocked` and `detectors_count` as `uint32_t` fields (reasons explained in #34). These changes left vectors of fired detectors as the sole remaining `std::vector<char>` data structure in this context. After implementing and evaluating optimizations in #25, #27, #34, and #45, profiling _Tesseract_ to analyze remaining bottlenecks revealed that, aside from the `get_detcost` function, a notable bottleneck emerged: `VectorCharHash` (originally `VectorBoolHash`). This function is responsible for hashing patterns of fired detectors to prevent re-exploring previously visited syndrome states. The implementation of `VectorCharHash` involved iterating through each element, byte by byte, and accumulating the hash. Even though this function saw significant speedups with the initial switch from `std::vector<bool>` to `std::vector<char>`, hashing patterns of fired detectors still consumed considerable time. Post-optimization profiling (after #25, #27, #34, and #45) revealed that this hashing function consumed approximately 25% of decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10% in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost` remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore, I decided to explore opportunities to further optimize this function and enhance the decoding speed. --- ### Solution: Introducing `boost::dynamic_bitset` This PR addresses the performance bottleneck of hashing fired detector patterns and mitigates the increased memory footprint from the initial switch to `std::vector<char>` by introducing the `boost::dynamic_bitset` data structure. The C++ standard library's `std::bitset` offers an ideal conceptual solution: memory-efficient bit-packed storage (like `std::vector<bool>`) combined with highly efficient access and modification operations (like `std::vector<char>`). This data structure achieves efficient access and modification by employing highly optimized bit-wise operations, thereby reducing performance overhead stemming from proxy objects in `std::vector<bool>`. However, `std::bitset` requires a static size (determined at compile-time), rendering it unsuitable for _Tesseract_'s dynamically sized syndrome patterns. The Boost library's `boost::dynamic_bitset` provides the perfect solution by offering dynamic-sized bit arrays whose dimensions can be determined at runtime. This data structure brilliantly combines the memory efficiency of `std::vector<bool>` (by packing elements into individual bits) with the performance benefits of direct element access and modification, similar to `std::vector<char>`. This is achieved by internally storing bits within a contiguous array of fundamental integer types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying elements using highly optimized bit-wise operations, thus avoiding the overheads of `std::vector<bool>`'s proxy objects and costly bit-wise operations. Furthermore, `boost::dynamic_bitset` offers highly optimized, built-in hashing functions, replacing our custom, less efficient byte-by-byte hashing and resulting in a cleaner, faster implementation. --- ### Performance Evaluation: Individual Impact of Optimization I performed two types of experiments to evaluate the achieved performance gains. First, I conducted extensive benchmarks across various code families and configurations to evaluate the individual performance gains achieved by this specific optimization. Speedups achieved include: * For Surface Codes: 8.0%-24.7% * For Transversal CNOT Protocols: 12.1%-26.8% * For Color Codes: 3.6%-7.0% * For Bivariate-Bicycle Codes: 0.5%-4.8% These results highlight the highest impact in Surface Codes and Transversal CNOT Protocols, which aligns with the initial profiling data that showcased these code families were spending more time in the original `VectorCharHash` function. --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a" /> <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925" /> --- #### Speedups in Color Codes <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0" /> <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047" /> <img width="1990" height="989" alt="img7" src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945" /> --- ### Performance Evaluation: Cumulative Speedup Following the evaluation of individual performance gains, I analyzed the cumulative effect of the optimizations implemented across PRs #25, #27, #34, and #45. The cumulative speedups achieved are: * For Color Codes: 40.7%-54.8% * For Bivariate-Bicycle Codes: 41.5%-80.3% * For Surface Codes: 50.0%-62.4% * For Transversal CNOT Protocols: 57.8%-63.6% These results demonstrate that my optimizations achieved over 2x speedup in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT Protocols, and over 5x speedup in Bivariate-Bicycle Codes. --- #### Speedups in Color Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69" /> <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd" /> <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452" /> --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1" /> --- ### Conclusion These results demonstrate that the `boost::dynamic_bitset` optimization significantly impacts code families where the original hashing function (`VectorCharHash`) was a primary bottleneck (Surface Codes and Transversal CNOT Protocols). The substantial speedups achieved in these code families validate that `boost::dynamic_bitset` provides demonstrably more efficient hashing and bit-wise operations. For code families where hashing was less of a bottleneck (Color Codes and Bivariate-Bicycle Codes), the speedups were modest, reinforcing that `std::vector<char>` can remain highly efficient even with increased memory usage when bit packing is not the primary performance concern. Crucially, this optimization delivers comparable or superior performance to `std::vector<char>` while simultaneously reducing memory footprint, providing additional speedups where hashing performance is critical. --- ### Key Contributions * Identified the hashing of syndrome patterns as the primary remaining bottleneck in Surface Codes and Transversal CNOT Protocols, post prior optimizations (#25, #27, #34, #45). * Adopted `boost::dynamic_bitset` as a superior data structure, combining `std::vector<bool>`'s memory efficiency with high-performance bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>` * Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing syndrome patterns. * Performed extensive benchmarking to evaluate both the individual impact of this optimization and its cumulative effect with prior PRs. * Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes). PR #47 contains the scripts I used for benchmarking and plotting the results. --------- Signed-off-by: Dragana Grbic <[email protected]> Co-authored-by: noajshu <[email protected]> Co-authored-by: LaLeh <[email protected]>

draganaurosgrbic and others added 9 commits June 14, 2025 14:52

Packing blocked errors and detection counts into a single array/struct

0a24685

for better data locality Signed-off-by: Dragana Grbic <[email protected]>

Remove/refactor redundant code

779e7ac

Signed-off-by: Dragana Grbic <[email protected]>

Minor changes

eff2a15

Signed-off-by: Dragana Grbic <[email protected]>

Merge remote-tracking branch 'origin/main' into optimization-cpu

4003520

Control the number of nodes inserted into priority queue

144affe

Signed-off-by: Dragana Grbic <[email protected]>

fix detector beam node skipping implementation

02c1066

beam search pruning based on min cost per detector count

6f4b5dd

Merge branch 'optimization-cpu' of https://github.com/quantumlib/tess…

e0d6f2f

…eract-decoder into optimization-cpu

Merge remote-tracking branch 'origin/main' into optimization-cpu

b16bc51

draganaurosgrbic requested a review from LalehB July 7, 2025 20:05

draganaurosgrbic added 2 commits July 7, 2025 13:10

Format src/tesseract.cc file

cfd157a

Signed-off-by: Dragana Grbic <[email protected]>

Merge branch 'main' into optimization-cpu

97b78eb

LalehB reviewed Jul 8, 2025

View reviewed changes

src/tesseract.cc Outdated Show resolved Hide resolved

src/tesseract.cc Show resolved Hide resolved

src/tesseract.h Outdated Show resolved Hide resolved

src/tesseract.cc Outdated Show resolved Hide resolved

draganaurosgrbic added 3 commits July 8, 2025 10:46

Remove unnecessary code

59b73f1

Signed-off-by: Dragana Grbic <[email protected]>

Merge remote-tracking branch 'origin/main' into optimization-cpu

bb2ac2b

Merge branch 'optimization-cpu' of https://github.com/quantumlib/tess…

d6a5659

…eract-decoder into optimization-cpu

Scripts for executing and plotting benchmarks

29aa98a

Remove scripts for benchmarking and plotting

d97840d

draganaurosgrbic mentioned this pull request Jul 10, 2025

Scripts for benchmarking and plotting performance data #47

Open

draganaurosgrbic changed the title ~~Optimized way of reverting blocked errors when --at-most-two-errors-per-detector enabled~~ Fixing the performance issue/bug in copying large vectors when using --at-most-two-errors-per-detector flag Jul 18, 2025

draganaurosgrbic requested a review from noajshu July 19, 2025 03:08

draganaurosgrbic mentioned this pull request Jul 20, 2025

Removing unnecessary std::vector copy operations #27

Merged

draganaurosgrbic added 2 commits July 20, 2025 23:02

Merge branch 'main' into optimization-cpu

1cb96cb

remove d2e_detcost

053d286

LalehB approved these changes Jul 25, 2025

View reviewed changes

src/tesseract.cc Outdated Show resolved Hide resolved

src/tesseract.cc Outdated Show resolved Hide resolved

draganaurosgrbic and others added 2 commits July 24, 2025 17:44

Update src/tesseract.cc

77d7250

Co-authored-by: LaLeh <[email protected]>

Update src/tesseract.cc

419d8aa

Co-authored-by: LaLeh <[email protected]>

draganaurosgrbic merged commit 181528e into main Jul 25, 2025
4 checks passed

draganaurosgrbic deleted the optimization-cpu branch July 25, 2025 00:45

draganaurosgrbic mentioned this pull request Jul 26, 2025

Hashing fired detectors with boost::dynamic_bitset #57

Merged

draganaurosgrbic removed the request for review from noajshu July 26, 2025 10:04

draganaurosgrbic changed the title ~~Fixing the performance issue/bug in copying large vectors when using --at-most-two-errors-per-detector flag~~ Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled Jul 26, 2025

NoureldinYosri mentioned this pull request Jul 29, 2025

Support MacOS ARM arch #59

Merged

NoureldinYosri mentioned this pull request Jul 31, 2025

BUILD on old ubuntu to support colab #63

Merged

draganaurosgrbic mentioned this pull request Aug 3, 2025

Caching and Trimming Errors per Detector in Color and BB Codes #46

Open

This was referenced Aug 7, 2025

Add gaurds to redirect C++'s stdout/stderr streams to python's stdout/stderr streams #68

Merged

Create a visualization class and wire it through the code #83

Merged

NoureldinYosri mentioned this pull request Sep 5, 2025

Build for py13 #122

Merged

NoureldinYosri mentioned this pull request Oct 20, 2025

drop macos-13 from workflows #146

Merged

Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled #45

Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled #45

Conversation

draganaurosgrbic commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixing a Performance Bottleneck and a Related Bug When Using the --at-most-two-errors-per-detector Flag

The Performance Issue (and also a Bug)

The Technical Solution

The Bug and its Fix

Benchmarking the Performance and Bug Fix

Analyzing the Impact of the Flag Itself

Conclusion: This Heuristic is not an Optimization

Key Contributions

Uh oh!

LalehB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalehB commented Jul 9, 2025

Uh oh!

noajshu commented Jul 9, 2025

Uh oh!

noajshu commented Jul 9, 2025

Uh oh!

draganaurosgrbic commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jul 10, 2025

Uh oh!

LalehB commented Jul 15, 2025

Uh oh!

draganaurosgrbic commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalehB commented Jul 16, 2025

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalehB commented Jul 16, 2025

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalehB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

draganaurosgrbic commented Jul 7, 2025 •

edited

Loading

Fixing a Performance Bottleneck and a Related Bug When Using the `--at-most-two-errors-per-detector` Flag

draganaurosgrbic commented Jul 8, 2025 •

edited

Loading

draganaurosgrbic commented Jul 9, 2025 •

edited

Loading

draganaurosgrbic commented Jul 9, 2025 •

edited

Loading

draganaurosgrbic commented Jul 10, 2025 •

edited

Loading

draganaurosgrbic commented Jul 15, 2025 •

edited

Loading

draganaurosgrbic commented Jul 15, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 16, 2025 •

edited

Loading

draganaurosgrbic commented Jul 19, 2025 •

edited

Loading