Skip to content

Conversation

@draganaurosgrbic
Copy link
Contributor

@draganaurosgrbic draganaurosgrbic commented Jul 7, 2025

Fixing a Performance Bottleneck and a Related Bug When Using the --at-most-two-errors-per-detector Flag

This PR addresses a critical performance issue and a related bug that existed when the --at-most-two-errors-per-detector flag was enabled. The core of the problem was the highly inefficient copying of large std::vector data structures, which was necessary for state management (mentioned in #27) but became prohibitively expensive due to previous optimizations. This PR completely removes these costly copy operations and, in doing so, implicitly fixes a bug that was degrading decoder accuracy.


The Performance Issue (and also a Bug)

This PR resolves a performance degradation that was exclusively present when using the --at-most-two-errors-per-detector flag. While the code path for these redundant copy operations existed from the beginning, its performance impact was initially masked by other bottlenecks. After significant speedups were achieved in other parts of the code through optimizations in PRs #25, #27, and #34, this flag-specific degradation became a major and prominent bottleneck. The cost of the copy operations escalated as the size of the data structures increased with these PRs. The current work focuses on resolving this last remaining degradation, ensuring that even when this flag is enabled, we achieve a high level of speedup consistent with other scenarios.


The Technical Solution

To solve this, I replaced the expensive copy-and-revert strategy with a more intelligent state management method. Instead of making a copy, I now store a special value (2) in the error_blocked field of the DetectorCostTuple struct. This special value indicates that an error was blocked specifically due to the --at-most-two-errors-per-detector flag and should be unblocked (or "reverted") in the next search state. This is made possible because the error_blocked field is a uint32_t, allowing it to hold values beyond a simple true/false (1 or 0).

This new approach completely removes the need for the next_next_detector_cost_tuples vector. Instead, all changes are now made on a single vector (next_detector_cost_tuples), which is much more efficient.


The Bug and its Fix

The old code contained a bug that was degrading the decoder's accuracy. The following loop:

for (int d : edets[ei]) {
  next_detectors[d] = !next_detectors[d];
  int fired = next_detectors[d] ? 1 : -1;
  next_num_detectors += fired;
  for (int oei : d2e[d]) {
    next_detector_cost_tuples[oei].detectors_count += fired;
  }

  if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
    for (size_t oei : d2e[d]) {
      next_next_detector_cost_tuples[oei].error_blocked = true;
    }
  }
}

had an inconsistency when the --at-most-two-errors-per-detector flag was enabled. The number of fired detectors was being updated on next_detector_cost_tuples, but the blocking of errors was being performed on next_next_detector_cost_tuples. When the get_detcost function was called, it would use next_next_detector_cost_tuples, which did not have the correct fired detector counts. This inconsistency explains why some benchmarks produced low confidence results and errors (e.g., 3 errors in a Surface Code benchmark).

The new code, which replaces the above loop, is as follows:

for (int d : edets[ei]) {
  next_detectors[d] = !next_detectors[d];
  int fired = next_detectors[d] ? 1 : -1;
  next_num_detectors += fired;
  for (int oei : d2e[d]) {
    next_detector_cost_tuples[oei].detectors_count += fired;
  }

  if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
    for (size_t oei : d2e[d]) {
      next_detector_cost_tuples[oei].error_blocked = next_detector_cost_tuples[oei].error_blocked == 1 ? 1 : 2;
    }
  }
}

By completely removing the next_next_detector_cost_tuples vector and performing all modifications on a single array, this PR implicitly fixes the bug and ensures consistency between the fired detector counts and the blocked error flags.


Benchmarking the Performance and Bug Fix

The results below clearly show a huge amount of time was lost on these expensive copy operations when using the --at-most-two-errors-per-detector flag. This was especially apparent in the Surface Code benchmark (r=11, p=0.002, 500 shots). This performance degradation escalated as the size of the data structures increased with previous optimizations.

The bug fix also improved accuracy. For the Surface Code benchmark (r=11, p=0.002, 500 shots) that previously had 3 errors, the new code produced 0 errors.

img1 img2 img3 img4

Analyzing the Impact of the Flag Itself

With the performance and correctness issues fixed, I performed additional experiments to analyze the intrinsic impact of the --at-most-two-errors-per-detector flag. The goal was to understand its effect on performance and accuracy now that the underlying implementation is optimized.

The results from these benchmarks show a consistent and strange behavior: the flag provides somewhat better accuracy but lower performance. This is the opposite of a typical heuristic, which is meant to trade some accuracy for a performance gain.

img5 img6 img7 img8

To collect more comprehensive data, I performed additional experiments on various groups of code families. These experiments confirmed the initial findings: the flag provides better accuracy at the cost of performance, with a decoding slowdown ranging from 0.2% to 69%.

img9 img10 img11 img12 img13 img14 img15 img16

Conclusion: This Heuristic is not an Optimization

I performed a final benchmark on a specific case that previously showed a performance benefit when using the flag. The command used for this benchmark was:

bazel build src:all && time ./bazel-bin/src/tesseract --pqlimit 200000 --beam 5 --num-det-orders 20 --sample-num-shots 20 --det-order-seed 13267562 --circuit testdata/colorcodes/r=9,d=9,p=0.002,noise=si1000,c=superdense_color_code_X,q=121,gates=cz.stim --sample-seed 717347 --threads 1 --print-stats

Before the optimizations in PR #34, the execution time without the flag was 75.91 seconds, while using the flag was 74.23 seconds. However, after applying the get_detcost optimizations from PR #34, the execution time without the flag was 69.01 seconds, while with the flag it was 72.98 seconds.

This demonstrates that the speedup from the optimizations in PR #34 (a gain of ~6.9 seconds) is far more significant than the speedup this heuristic flag initially provided (a gain of ~1.68 seconds). The flag's intended purpose of improving performance by pruning the search space is now outweighed by the efficiency gains in the core decoding function. The optimization I implemented in PR #34 achieved a better speedup than the one this heuristic flag might initially have achieved, making the flag-specific degradation an issue that needed to be addressed. It is clear that my optimization work now provides a high speedup in all scenarios.

My conclusion is that the current version of the Tesseract algorithm is faster without using this flag. The next logical step may be to remove the flag entirely, but I am leaving that decision to the original implementers of the flag.


Key Contributions

  • Performance Fix: Removed a performance degradation that was left in the code logic for the --at-most-two-errors-per-detector flag. This logic performed redundant copy operations on large vectors instead of a smarter method of reverting changes. This fixes the last remaining performance degradation that was specific to using the heuristic flag.
  • Bug Fix: Identified and fixed a bug that existed when the --at-most-two-errors-per-detector flag was enabled, which was causing accuracy issues and low confidence results.
  • Intelligent Solution: Implemented a smarter, copy-free state management strategy using a special 2 value in the DetectorCostTuple structure, made possible by previous data representation changes.
  • Extensive Benchmarking: Conducted extensive experiments to evaluate the performance and bug fixes and to analyze the intrinsic behavior of the flag itself.
  • Informed Future Development: Provided comprehensive data that clearly shows the flag no longer provides a performance benefit and that its behavior is contrary to its purpose as a heuristic. This provides a strong basis for the team to decide on its future.

@draganaurosgrbic draganaurosgrbic requested a review from LalehB July 7, 2025 20:05
Copy link
Collaborator

@LalehB LalehB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Dragana!
Could you include some benchmarking results for both performance and accuracy? It would be good to quantify the impact of this change.

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 8, 2025

@LalehB Please see the updated description of the PR. It should answer all of your questions.

@LalehB
Copy link
Collaborator

LalehB commented Jul 9, 2025

@LalehB see the updated description of the PR. It should answer all of your questions.

Thank you, @draganaurosgrbic, for addressing the comments. I noticed that you're reporting the number of low-confidence events to quantify accuracy — could you also include the error count for completeness?
Additionally, I was wondering how much enabling the --at-most-two-errors-per-detector flag impacts accuracy in these benchmarks. Have you observed any noticeable differences with or without this flag?

@noajshu
Copy link
Contributor

noajshu commented Jul 9, 2025

This is a pretty incredible speedup when the flag is enabled!

Although the flag does effect accuracy, I wonder if we couldn't use a similar technique but possibly allow more than 2. Like --at-most-4-errors-per-detector since that's going to have a much smaller impact on accuracy.

I also wonder just how much accuracy is lost for e.g. the large color codes.

@noajshu
Copy link
Contributor

noajshu commented Jul 9, 2025

I noticed that you're reporting the number of low-confidence events to quantify accuracy — could you also include the error count for completeness?

+1 could you share the error rates as well please?

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 9, 2025

@LalehB Please check the PR again, I added the plots that compare the speed and accuracy with and without using the --at-most-two-errors-per-detector flag. For the benchmarks I analyzed in this PR, it provides somewhat better accuracy, but lower performance.

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 9, 2025

@LalehB @noajshu For the benchmarks I analyzed in this PR, the only error rates I encountered were in the Surface Code benchmark (r=11, p=0.002, 500 shots). I encountered only 3 errors before I implemented this PR and 0 after I implemented this PR. I wrote that in the PR description (it is bolded).

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 10, 2025

@LalehB Please see the updated description of the PR, it explains the change/improvement in the accuracy of the decoder. There was a bug before this PR that was occurring only when using this flag. This PR also fixes that bug.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@LalehB
Copy link
Collaborator

LalehB commented Jul 15, 2025

Thank you @draganaurosgrbic for the update to this PR, I am very confused by this data chart that you included:
image
the blue bar which is labeled as "At most two errors per detectors" is showing higher decoding time?? does this mean this flag makes the decoding slower??

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 15, 2025

@LalehB Yes, when I performed these benchmarks, I was getting lower performance when using this flag, but somewhat better accuracy (check the graphs for the accuracy/low confidence count).

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 15, 2025

@LalehB Please also note the two very important contributions of this PR: completely fixes the bug of copying large vectors and fixes the bug that we didn't know existed before (inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples when this flag is enabled).

@LalehB
Copy link
Collaborator

LalehB commented Jul 16, 2025

@LalehB Yes, when I performed these experiments/benchmarks, I was getting slower decoding times with this flag, but somewhat higher accuracy (check the graphs for the accuracy/low confidence count)

@draganaurosgrbic I thought the accuracy improvement was because of the bug fix of "(inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples)", however would you please explain to me how the flag --at-most-two-errors-per-detector results in worse performance since technically it suppose to do less work? if it makes the performance worse shouldn't we just remove the flag then (since it is introduced to improve performance) and just fix the bug?

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

@LalehB Please check the PR description again, there are two accuracy changes/improvements I analyzed. The first one is before and after I fixed the performance issue/bug (inconsistent update on next_detector_cost_tuples and next_next_detector_cost_tuples). Another analysis is the comparison of accuracy after I fixed this bug: comparison of accuracy when using --at-most-two-errors-per-detector flag with the version of not using the flag at all. The PR has plots/graphs for both of these.

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

@LalehB The first analysis/comparison of accuracy is the comparison before and after I fixed the bug. I realized there was a bug because when I completely removed the redundant std::vector copy operations, I compared the accuracy with the version before I implemented this change. Then I noticed there is a difference in the accuracy. I realized that happened because there was this bug we did not notice before and that I fixed when I removed redundant std::vector copy operations. After I fixed the bug, I performed another comparison of the accuracy with and without using the flag. I performed this additional comparison to analyze the actual impact of this flag on the performance and accuracy. Hope this makes sense.

@LalehB
Copy link
Collaborator

LalehB commented Jul 16, 2025

@draganaurosgrbic I think it would be good to touch base on some of project goals here:

  • If there's a bug, it's always worth fixing it.
  • It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
  • When we introduce a heuristic flag, we should clarify what it's supposed to achieve. For example:
    Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
    If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.
    In general, if a heuristic reduces computation, we should expect it to improve performance. If it doesn't, we need to understand why. (I think here the the impact of bug fix and the heuristic flag --at-most-two-errors-per-detector are mixed up)

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

@LalehB Thank you for your feedback. Maybe my PR description is not clear enough. Please read this:

  1. I started this PR with the plan to fix the performance issue I explained. I did not know there was a bug, the plan was not to fix the bug, I only realized there was a bug when I fixed the performance issue.
  2. When I discovered the performance issue and fixed it, I realized that by fixing that performance issue I also fixed a bug we did not notice before. I realized that by comparing the accuracy before and after fixing the performance issue (and also, implicitly fixing the bug). So I wanted to make sure I documented that, as it is important to understand everything that my code changes are affecting. That's why I am also explaining the logic behind the bug fix.
  3. After I optimized and fixed this flag, I wanted to understand its impact on the performance and accuracy since we did not mention this flag at our meetings at all.
  4. I did not introduce or implement this flag, this flag existed before. I only discovered a performance issue when using this flag, fixed that performance issue and when fixing that performance issue also fixed the bug that existed and documented everything.
  5. The only thing that might be missing in this PR is why this flag does not provide better performance, but only accuracy (on these specific benchmarks). I also wanted to make sure I did not corrupt Tesseract's performance during my internship, so I tested and benchmarked when using this flag and not using it before I made any changes to Tesseract (before I started my internship). This flag had worse performance, but better accuracy on these benchmarks even before I started my internship. Maybe we could test on other benchmarks? Does the rest of the team have experiments they performed when they initially implemented this flag where they noticed better performance?

I think here the the impact of bug fix and the heuristic flag --at-most-two-errors-per-detector are mixed up

No, this PR has two logical parts:

  1. Fixing the performance issue (which also fixed the bug) -> separate accuracy comparison -> accuracy comparison before and after the fix -> change in the accuracy is the result of fixing the bug
  2. After we actually fixed the --at-most-two-errors-per-detector flag, we want to understand its impact on the performance and accuracy. The results I have shown this flag does not provide better performance, but only better accuracy on these benchmarks -> separate accuracy comparison -> accuracy comparison after the previous bug is fixed -> accuracy comparison with and without using the flag -> this flag provides better accuracy on these benchmarks

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.

@LalehB Please check the PR description again, these graphs show the accuracy improvement coming from the performance fix (and also bug fix, implicitly). They are already included in the PR description.

Screenshot 2025-07-15 6 31 31 PM Screenshot 2025-07-15 6 31 50 PM

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

It's important to separate the impact of a bug fix on accuracy from the impact of a heuristic.
Is the accuracy improvement coming from the heuristic itself, or from a bug fix?
If it's due to the bug fix and not the heuristic, we need to evaluate the heuristic on its own.

@LalehB Please check the PR description again, these graphs show the accuracy improvement coming from the heuristic itself. They are already included in the PR description.

Screenshot 2025-07-15 6 32 50 PM Screenshot 2025-07-15 6 33 05 PM

@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 16, 2025

@LalehB Hope this makes sense now. All of these graphs are already included in the PR description. I thought my logical flow of all of these things are clear from the description. I changed the workflow of the description, it should be more clear now.

@draganaurosgrbic draganaurosgrbic changed the title Optimized way of reverting blocked errors when --at-most-two-errors-per-detector enabled Fixing the performance issue/bug in copying large vectors when using --at-most-two-errors-per-detector flag Jul 18, 2025
@draganaurosgrbic draganaurosgrbic requested a review from noajshu July 19, 2025 03:08
@draganaurosgrbic
Copy link
Contributor Author

draganaurosgrbic commented Jul 19, 2025

@LalehB @noajshu The PR description is updated, including more comprehensive data for analyzing the impact of the flag on the performance and accuracy, summary list that includes major contributions, as well as the results for the benchmark we looked at during our last meeting. I have also updated the graphs to look more clear. If you run into any issues, please let me know.

Copy link
Collaborator

@LalehB LalehB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@draganaurosgrbic draganaurosgrbic merged commit 181528e into main Jul 25, 2025
4 checks passed
@draganaurosgrbic draganaurosgrbic deleted the optimization-cpu branch July 25, 2025 00:45
@draganaurosgrbic draganaurosgrbic removed the request for review from noajshu July 26, 2025 10:04
@draganaurosgrbic draganaurosgrbic changed the title Fixing the performance issue/bug in copying large vectors when using --at-most-two-errors-per-detector flag Fixing the performance issue/bug in copying large vectors when --at-most-two-errors-per-detector flag enabled Jul 26, 2025
draganaurosgrbic added a commit that referenced this pull request Jul 30, 2025
### Hashing Syndrome Patterns with `boost::dynamic_bitset`
In this PR, I address a key performance bottleneck: the hashing of fired
detector patterns (syndrome patterns). I introduce the use of
`boost::dynamic_bitset` from the Boost library, a data structure that
combines the memory-saving bit-packing feature of `std::vector<bool>`
with highly optimized bit-wise operations and built-in hashing, enabling
fast access and modification operations like in `std::vector<char>`.
Crucially, `boost::dynamic_bitset` also provides highly optimized,
built-in functions for efficiently hashing sequences of boolean
elements.

---

### Initial Optimization: `std::vector<bool>` to `std::vector<char>`
The initial _Tesseract_ implementation, as documented in #25, utilized
`std::vector<bool>` to store patterns of fired detectors and predicates
that block specific errors from being added to the current error
hypothesis. While `std::vector<bool>` optimizes memory usage by packing
elements into individual bits, accessing and modifying its elements is
highly inefficient due to its reliance on proxy objects that perform
costly bit-wise operations (shifting, masking). Given _Tesseract_'s
frequent access and modification of these elements, this caused
significant performance overheads.

In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`.
This change made boolean elements addressable bytes, enabling efficient
and direct byte-level access. Although this increased memory footprint
(as each boolean was stored as a full byte), it delivered substantial
performance gains by eliminating `std::vector<bool>`'s proxy objects and
their associated overheads for element access and modification. Speedups
achieved with this initial optimization were significant:
* For Color Codes, speedups reached 17.2%-32.3%
* For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3%
* For Surface Codes, speedups reached 33.4%-42.5%
* For Transversal CNOT Protocols, speedups reached 12.2%-32.4%

These significant performance gains highlight the importance of choosing
appropriate data structures for boolean sequences, especially in
performance-sensitive applications like _Tesseract_. The remarkable
42.5% speedup achieved in Surface Codes with this initial switch
underscores the substantial overhead caused by unsuitable data
structures. The performance gain from removing `std::vector<bool>`'s
proxy objects and their inefficient operations far outweighed any
overhead from increased memory consumption.

---

### Current Bottleneck: `std::vector<char>` and Hashing
Following the optimizations in #25, _Tesseract_ continued to use
`std::vector<char>` for storing and managing patterns of fired detectors
and predicates that block errors. Subsequently, PR #34 replaced and
merged vectors of blocked errors into the `DetectorCostTuple` structure,
which efficiently stores `error_blocked` and `detectors_count` as
`uint32_t` fields (reasons explained in #34). These changes left vectors
of fired detectors as the sole remaining `std::vector<char>` data
structure in this context.

After implementing and evaluating optimizations in #25, #27, #34, and
#45, profiling _Tesseract_ to analyze remaining bottlenecks revealed
that, aside from the `get_detcost` function, a notable bottleneck
emerged: `VectorCharHash` (originally `VectorBoolHash`). This function
is responsible for hashing patterns of fired detectors to prevent
re-exploring previously visited syndrome states. The implementation of
`VectorCharHash` involved iterating through each element, byte by byte,
and accumulating the hash. Even though this function saw significant
speedups with the initial switch from `std::vector<bool>` to
`std::vector<char>`, hashing patterns of fired detectors still consumed
considerable time. Post-optimization profiling (after #25, #27, #34, and
#45) revealed that this hashing function consumed approximately 25% of
decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10%
in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost`
remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore,
I decided to explore opportunities to further optimize this function and
enhance the decoding speed.

---

### Solution: Introducing `boost::dynamic_bitset`
This PR addresses the performance bottleneck of hashing fired detector
patterns and mitigates the increased memory footprint from the initial
switch to `std::vector<char>` by introducing the `boost::dynamic_bitset`
data structure. The C++ standard library's `std::bitset` offers an ideal
conceptual solution: memory-efficient bit-packed storage (like
`std::vector<bool>`) combined with highly efficient access and
modification operations (like `std::vector<char>`). This data structure
achieves efficient access and modification by employing highly optimized
bit-wise operations, thereby reducing performance overhead stemming from
proxy objects in `std::vector<bool>`. However, `std::bitset` requires a
static size (determined at compile-time), rendering it unsuitable for
_Tesseract_'s dynamically sized syndrome patterns.

The Boost library's `boost::dynamic_bitset` provides the perfect
solution by offering dynamic-sized bit arrays whose dimensions can be
determined at runtime. This data structure brilliantly combines the
memory efficiency of `std::vector<bool>` (by packing elements into
individual bits) with the performance benefits of direct element access
and modification, similar to `std::vector<char>`. This is achieved by
internally storing bits within a contiguous array of fundamental integer
types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying
elements using highly optimized bit-wise operations, thus avoiding the
overheads of `std::vector<bool>`'s proxy objects and costly bit-wise
operations. Furthermore, `boost::dynamic_bitset` offers highly
optimized, built-in hashing functions, replacing our custom, less
efficient byte-by-byte hashing and resulting in a cleaner, faster
implementation.

---

### Performance Evaluation: Individual Impact of Optimization
I performed two types of experiments to evaluate the achieved
performance gains. First, I conducted extensive benchmarks across
various code families and configurations to evaluate the individual
performance gains achieved by this specific optimization. Speedups
achieved include:
* For Surface Codes: 8.0%-24.7%
* For Transversal CNOT Protocols: 12.1%-26.8%
* For Color Codes: 3.6%-7.0%
* For Bivariate-Bicycle Codes: 0.5%-4.8%

These results highlight the highest impact in Surface Codes and
Transversal CNOT Protocols, which aligns with the initial profiling data
that showcased these code families were spending more time in the
original `VectorCharHash` function.

---

#### Speedups in Surface Codes

<img width="1990" height="989" alt="img1"
src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41"
/>

---

#### Speedups in Transversal CNOT Protocols

<img width="1990" height="989" alt="img2"
src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a"
/>

<img width="1990" height="989" alt="img3"
src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925"
/>

---

#### Speedups in Color Codes

<img width="1990" height="989" alt="img4"
src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0"
/>

<img width="1990" height="989" alt="img5"
src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f"
/>

---

#### Speedups in Bivariate-Bicycle Codes

<img width="1990" height="989" alt="img6"
src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047"
/>

<img width="1990" height="989" alt="img7"
src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945"
/>

---

### Performance Evaluation: Cumulative Speedup
Following the evaluation of individual performance gains, I analyzed the
cumulative effect of the optimizations implemented across PRs #25, #27,
#34, and #45. The cumulative speedups achieved are:
* For Color Codes: 40.7%-54.8%
* For Bivariate-Bicycle Codes: 41.5%-80.3%
* For Surface Codes: 50.0%-62.4%
* For Transversal CNOT Protocols: 57.8%-63.6%

These results demonstrate that my optimizations achieved over 2x speedup
in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT
Protocols, and over 5x speedup in Bivariate-Bicycle Codes.

---

#### Speedups in Color Codes

<img width="1990" height="989" alt="img1"
src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69"
/>

<img width="1990" height="989" alt="img2"
src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680"
/>

---

#### Speedups in Bivariate-Bicycle Codes

<img width="1990" height="989" alt="img3"
src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd"
/>

<img width="1990" height="989" alt="img4"
src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452"
/>

---

#### Speedups in Surface Codes

<img width="1990" height="989" alt="img5"
src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666"
/>

---

#### Speedups in Transversal CNOT Protocols

<img width="1990" height="989" alt="img6"
src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1"
/>

---

### Conclusion
These results demonstrate that the `boost::dynamic_bitset` optimization
significantly impacts code families where the original hashing function
(`VectorCharHash`) was a primary bottleneck (Surface Codes and
Transversal CNOT Protocols). The substantial speedups achieved in these
code families validate that `boost::dynamic_bitset` provides
demonstrably more efficient hashing and bit-wise operations. For code
families where hashing was less of a bottleneck (Color Codes and
Bivariate-Bicycle Codes), the speedups were modest, reinforcing that
`std::vector<char>` can remain highly efficient even with increased
memory usage when bit packing is not the primary performance concern.
Crucially, this optimization delivers comparable or superior performance
to `std::vector<char>` while simultaneously reducing memory footprint,
providing additional speedups where hashing performance is critical.

---

### Key Contributions
* Identified the hashing of syndrome patterns as the primary remaining
bottleneck in Surface Codes and Transversal CNOT Protocols, post prior
optimizations (#25, #27, #34, #45).
* Adopted `boost::dynamic_bitset` as a superior data structure,
combining `std::vector<bool>`'s memory efficiency with high-performance
bit-wise operations and built-in hashing, enabling fast access and
modification operations like in `std::vector<char>`
* Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing
syndrome patterns.
* Performed extensive benchmarking to evaluate both the individual
impact of this optimization and its cumulative effect with prior PRs.
* Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface
Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial
cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes
and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes).

PR #47 contains the scripts I used for benchmarking and plotting the
results.

---------

Signed-off-by: Dragana Grbic <[email protected]>
Co-authored-by: noajshu <[email protected]>
Co-authored-by: LaLeh <[email protected]>
@NoureldinYosri NoureldinYosri mentioned this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants