error correction advanced · 19 min read · By LIPAI WANG · April 28, 2026

qLDPC Codes: The Surface Code Successor That Already Cuts Qubit Overhead by 10x

The surface code's d² qubit overhead is the dominant constant in every fault-tolerant resource estimate. Quantum low-density parity check codes — qLDPC — achieve the same logical error rates with overhead that scales like d, often translating to 10x fewer physical qubits per logical qubit at useful code sizes. This tutorial covers the 2021-2024 breakthroughs (Panteleev-Kalachev, Bravyi-Cross IBM bicycle codes), the connectivity tradeoffs, and IBM's Starling roadmap built around them.

Prerequisites: Tutorial 19: The Surface Code and Willow, Tutorial 27: Resource Estimation

Tutorial 19 left you with the surface code as the dominant fault-tolerance scheme. Tutorial 27 quantified its overhead: $\sim 1{,}500$ physical qubits per logical qubit at code distance 27, leading to the famous $\sim 20$ -million-qubit RSA-2048 estimate. That number is uncomfortably big. Cutting it by an order of magnitude would not just be an academic improvement — it would change which decade fault-tolerant quantum computing becomes practical in.

That order of magnitude is what quantum low-density parity check codes (qLDPC) plausibly deliver. The surface code’s qubit overhead grows as $d^2$ in code distance; the best qLDPC constructions achieve $O(d)$ overhead asymptotically, and the practical IBM “bicycle” codes from 2024 achieve roughly 10× fewer physical qubits than surface code at the same logical performance. IBM’s published “Starling” roadmap is explicitly built around running qLDPC at scale by 2029. This is the most consequential development in fault-tolerant quantum computing of the last five years, and it has shifted the entire industry’s resource calculus.

This tutorial covers what qLDPC codes are, how the 2021-2024 breakthroughs got us here, what the practical tradeoffs are, and how to read a “we’re going qLDPC” architectural claim with appropriate skepticism.

Why surface-code overhead is what it is

Every stabilizer code has a triple of integer parameters: $[[n, k, d]]$ , where $n$ is the physical-qubit count, $k$ is the logical-qubit count, and $d$ is the code distance (the minimum-weight uncorrectable error). The two figures of merit are:

Rate: $k / n$ . How many logical qubits per physical qubit.
Relative distance: $d / n$ . How well-protected each logical qubit is.

For the standard surface code on a single logical qubit ( $k = 1$ ):

$n \approx 2 d^2$ (rotated layout)
Rate: $1 / (2 d^2)$ , vanishing as $d$ grows.
Relative distance: $d / (2 d^2) = 1 / (2 d)$ , also vanishing.

Both rate and relative distance go to zero in the asymptotic limit. This is the surface code’s “thumb in the dam” character: you pay quadratically more qubits per logical qubit for each unit of additional protection. It works, it has a clean planar layout, but the scaling is unforgiving.

The dream: a code family where rate stays constant ( $k / n \to c_1 > 0$ ) and relative distance stays constant ( $d / n \to c_2 > 0$ ) as $n$ grows. Such a family is called asymptotically good. For decades it was unclear whether asymptotically good quantum codes existed at all — the analogous classical objects (LDPC codes) had been known since Gallager 1962, but the quantum case is much harder because of the simultaneous-commutation requirement on stabilizers.

The qLDPC definition

A quantum LDPC code is a stabilizer code whose stabilizer generators are sparse: each generator acts on a constant number of physical qubits (independent of $n$ ), and each physical qubit participates in a constant number of generators.

Sparsity is the LDPC half of the name. The “Q” requires also that the stabilizer generators commute pairwise (the standard stabilizer-code requirement).

Sparsity is what lets you decode efficiently. Decoding a stabilizer code amounts to finding the most likely error pattern given the syndrome. For sparse stabilizer generators, the decoding problem is a sparse-graph belief-propagation problem, which is solvable in time linear in the syndrome size for many code families. This is exactly the property that made classical LDPC codes practical for cellphone error correction. The quantum analog has the same structural advantage.

The surface code is itself qLDPC — its stabilizers are weight-4 plaquettes and vertices on a 2D lattice, sparse and local. What makes the surface code “the surface code” rather than just “a qLDPC code” is the extra geometric constraint of locality on a 2D plane. Drop that constraint and you can do much better.

The Panteleev-Kalachev breakthrough

In 2021, Panteleev and Kalachev proved the existence of asymptotically good quantum LDPC codes — code families with rate and relative distance both bounded away from zero in the limit. This solved the decades-old open question of whether such codes existed.

The construction uses lifted product codes built from non-abelian group structures. The technical details are heavy (the group theory and homological algebra alone fill several papers), but the upshot is a recipe: take a classical LDPC code with good rate and distance, lift it through a carefully chosen group action, and the result is a quantum LDPC code that inherits good rate and distance from the classical input.

Quantitatively, the Panteleev-Kalachev codes (and the closely related Leverrier-Zémor 2022 construction) achieve:

Rate: $k / n = c_1$ for some constant $c_1$ (the original construction has $c_1 \approx 1/15$ ).
Relative distance: $d / n = c_2$ for some constant $c_2$ .
Stabilizer weight: constant.

For large $n$ , this is dramatically better than the surface code’s $1/(2d^2)$ rate. At $n = 1500$ physical qubits, surface-code rate is $1/1500$ ; a Panteleev-Kalachev code at the same $n$ has rate $\sim 1/15$ , encoding ~100 logical qubits where the surface code encodes 1.

The catch: the asymptotic regime where these codes shine is large $n$ . For modest $n$ (say, a few hundred physical qubits), the surface code is competitive. The Panteleev-Kalachev breakthrough is most relevant for the multi-thousand-qubit regime where actual fault-tolerant computers will live.

IBM bicycle codes: theory becomes practice

The 2021 Panteleev-Kalachev result was a theoretical milestone, but it did not immediately yield codes you would build hardware around. The construction is mathematically elegant and concretely awkward. The codes have all-to-all-ish connectivity requirements, awkward block sizes, and decoders that work in theory but had not been engineered.

In 2024, IBM (Bravyi, Cross, Gambetta, Maslov, Rall, Yoder) published bicycle codes — a much simpler qLDPC family designed to be implementable on near-term hardware. The construction uses a pair of classical convolutional codes wrapped into a quantum code via a CSS construction. The result is:

Practical block sizes: [[144, 12, 12]] codes encoding 12 logical qubits in 144 physical qubits with distance 12.
Bounded connectivity: each physical qubit interacts with at most $\sim 6$ others (vs. surface code’s 4-coordinate constraint, which is stricter but planar).
Concrete decoders: belief-propagation-with-ordered-statistics-decoding (BP-OSD) achieves logical error rates comparable to MWPM on the surface code, at least at the block sizes studied.

The headline number from the IBM 2024 paper: at code distance 12, a bicycle code encodes 12 logical qubits in 144 physical qubits. Surface code at code distance 12 needs $\sim 2 \cdot 144 = 288$ physical qubits per logical qubit. So 12 logical qubits in surface code take $\sim 3{,}500$ physical qubits, vs. 144 for the bicycle code — a $\sim 24\times$ qubit-overhead reduction at this block size.

IBM has published target performance numbers for larger bicycle codes that suggest the $\sim 10\times$ qubit overhead reduction holds across the size range relevant for fault-tolerant computing. RSA-2048 with bicycle codes would target $\sim 2$ - $3$ million physical qubits instead of $\sim 20$ million — squarely in the range where current fab capability could plausibly deliver in the late 2020s.

The IBM Starling roadmap

In 2024-2025, IBM published a fault-tolerance roadmap explicitly built around bicycle codes:

2025: Heron r2 transmon devices, demonstrating the gate fidelities required (~ $10^{-3}$ two-qubit error).
2026: Nighthawk system, the first IBM device with the connectivity needed for bicycle codes (each qubit connected to 6 neighbors via tunable couplers).
2027-2028: First demonstration of bicycle-code logical qubits at modest distance, on the order of $[[72, 6, 6]]$ .
2029: “Starling” — the target machine, ~ $10{,}000$ physical qubits running bicycle codes to host ~200 logical qubits at distance ~12, sufficient for some fault-tolerant chemistry simulations and resource-estimable subroutines of larger algorithms.

The roadmap is the most public commitment by a major hardware company to qLDPC over the surface code. Google’s Willow program continues on the surface code; Quantinuum’s roadmap has remained somewhat agnostic; PsiQuantum is on its own photonic-fault-tolerance track. The IBM bet is the most concrete data point on whether qLDPC delivers in production by 2030.

The connectivity tradeoff

qLDPC codes’ overhead advantage comes with a cost: they need non-planar connectivity. The surface code’s stabilizers act on the 4 nearest neighbors of a 2D lattice site — perfectly compatible with planar superconducting hardware where each qubit has 4 neighbor connections. Bicycle codes need each qubit to interact with $\sim 6$ qubits, some of which are not nearest neighbors on a 2D plane.

Three engineering responses to this:

Tunable couplers and 3D integration. Modern superconducting platforms (IBM Nighthawk, Google’s pre-Willow architecture) increasingly use tunable couplers that can selectively gate between non-nearest-neighbor pairs. 3D integration (qubits and couplers on different chip layers) extends the available connectivity at the cost of fabrication complexity.
All-to-all connectivity hardware. Trapped ions and neutral atoms can move qubits physically, giving effectively all-to-all connectivity. This is “free” connectivity for qLDPC, at the cost of slower physical gates than superconducting.
Hybrid approaches. Photonic platforms (PsiQuantum, Xanadu) use measurement-based fault tolerance with different code families entirely; the qLDPC story is most directly relevant to gate-model superconducting and atomic platforms.

The IBM bet is that tunable-coupler-based superconducting can deliver the $\sim 6$ -coordinate connectivity bicycle codes need without giving up their gate-speed advantage. The early Nighthawk results in 2026 are the test of this.

The decoder tradeoff

The surface code has a beautiful decoding story: minimum-weight perfect matching (MWPM) is exact, runs in polynomial time, and has been engineered into specialized hardware decoders that keep up with $\mu$ s-level cycle times. Several decades of research have polished MWPM until it is a deployable production tool.

qLDPC decoders are not yet at this maturity:

Belief propagation is the natural starting point, since qLDPC codes are sparse-graph codes. But naive BP fails on quantum codes due to short cycles in the Tanner graph.
BP with ordered-statistics decoding (BP-OSD) is the current standard for bicycle codes. It works, but it is computationally heavier than MWPM and harder to engineer into specialized hardware.
Tensor-network decoders are an active research direction with better accuracy but higher cost.
Neural decoders trained on simulated syndromes are an active research area; promising but not yet production-grade.

The 2024-2026 frontier is making qLDPC decoders run as fast as MWPM does on the surface code. Without that, the qubit-overhead win can be eaten by a need for slower cycles.

The logical-gate tradeoff

The surface code has very clean transversal Clifford operations (tutorial 25) and well-understood lattice-surgery constructions for non-transversal operations. qLDPC codes are still developing this story:

Transversal gates on qLDPC codes are not as well-characterized as on the surface code. Whether bicycle codes inherit transversal Cliffords easily is a genuine open question that the IBM 2024 papers address only partially.
Code switching from qLDPC to a magic-state-friendly code is the standard plan for non-Clifford gates. This adds back some of the overhead the qLDPC construction was supposed to save.
Magic-state distillation factories still apply, with the same Eastin-Knill-driven cost. The factory area itself can use a separate code (or even a separate qLDPC family); IBM’s published estimates suggest this is the plan.

The net qubit overhead reduction from qLDPC, after accounting for these complications, is closer to 10x than 100x. That is still a transformative number for fault-tolerant scaling, but it is not a free lunch.

Updated resource estimate sketch

Re-running tutorial 27’s RSA-2048 estimate with bicycle-code-style qLDPC parameters:

Algorithm logical qubits: unchanged at ~2,400.
Physical qubits per logical qubit at $d \approx 27$ : $\sim 150$ (roughly 10× lower than surface code’s 1,500).
Algorithm physical qubits: $\sim 360{,}000$ (10× lower).
Magic-state factory area: also reduces, perhaps to $\sim 1$ - $2$ million qubits if the factory codes are also qLDPC.
Total physical qubits: $\sim 2$ - $3$ million.
Wall-clock time: comparable to surface code (~hours), modulo decoder-speed differences.

A roughly 10× total qubit overhead reduction. This is the regime in which RSA-2048-cracking machines become engineering-feasible rather than aspirational. It is also the regime in which post-quantum cryptography migration becomes urgent rather than precautionary — and is the reason 2024-2026 NIST/CISA timelines have tightened.

Common misconceptions

“qLDPC has solved fault tolerance.” No. The 2021 Panteleev-Kalachev result solved the existence question (asymptotically good qLDPC codes exist). The 2024 bicycle-code result solved the practical near-term question (qLDPC codes that beat surface code on real hardware exist). What remains: better decoders, better logical-gate constructions, hardware demonstrations at scale, integration with magic-state distillation. Each is an active research and engineering area.

“qLDPC will replace the surface code by 2027.” Probably not by 2027. By 2030, plausibly. The surface code has 20 years of accumulated decoder, fault-tolerance, and hardware-co-design work. qLDPC is starting from a much smaller base. Even if qLDPC is asymptotically better, the surface code may remain the production choice for the first generation of fault-tolerant machines while qLDPC matures.

“qLDPC eliminates magic-state distillation.” Not directly. Eastin-Knill (tutorial 26) applies to qLDPC codes too. Bicycle codes still need a non-transversal route to T gates, which still costs factory area. qLDPC reduces the algorithm-qubit overhead; it does not by itself reduce the factory cost. The 10× total reduction comes from the algorithm side and from being able to use smaller codes for the factories themselves.

“All qLDPC codes are bicycle codes.” Bicycle codes are one family. Other practical qLDPC families include Tanner codes, lifted-product codes, fiber-bundle codes, and a growing taxonomy of constructions. Each has different connectivity, decoder, and logical-gate tradeoffs. The 10× number comes specifically from bicycle codes; other families may do better or worse on different axes.

Decision rule

When you read a “we’re using qLDPC” or “we beat surface code overhead” claim, run this checklist:

Which qLDPC family? Bicycle, Tanner, Panteleev-Kalachev, Leverrier-Zémor, fiber bundle, something else? The family determines the connectivity, decoder, and logical-gate story.
What block size is the comparison at? qLDPC’s advantage is asymptotic; at small block sizes (e.g., 50 physical qubits) the surface code can be competitive or better. The 10× advantage typically requires multi-hundred-qubit blocks.
What decoder is being used, and what is its measured performance? BP-OSD numbers in simulation are not the same as fault-tolerant performance under realistic noise. Demand both.
What is the connectivity assumption? A bicycle code that needs degree-6 connectivity does not run on a degree-4 superconducting chip without 3D integration or tunable couplers. The connectivity has to match the hardware.
What is the magic-state-distillation plan? qLDPC reduces algorithm overhead, not factory overhead. The factory plan is still load-bearing for total resource estimates.
What is the hardware demonstration timeline? A 2024 theoretical paper is not a 2027 production system. Ask which milestones are demonstrated and which are projected.

A vendor proposal that survives all six is a credible architectural claim. Most surviving proposals as of 2026 are IBM’s Starling roadmap and a handful of academic-industrial collaborations. The qLDPC space is moving fast and is worth tracking.

Common practical workflow

If you are a working developer trying to use these codes in 2026:

Stim (Google) supports surface code natively; qLDPC support exists but is less mature.
PyMatching is the production decoder for surface codes; for qLDPC use BP-OSD via ldpc-pkg or the IBM-released decoder library.
Qiskit has experimental qLDPC support via the qiskit-qec package.
The IBM 2024 bicycle-code paper publishes its [[144, 12, 12]] construction explicitly; you can simulate it on a small cluster.

For learning and benchmarking, the surface code remains the entry point. For 2030-and-beyond resource estimates, bicycle codes are increasingly the reference family.

Exercises

1. Rate comparison

The surface code at $d = 27$ has $n \approx 1{,}500$ and $k = 1$ , rate $\approx 0.0007$ . A bicycle code at $d = 12$ has $n = 144$ and $k = 12$ , rate $\approx 0.083$ . Compute the qubit savings if you encode 100 logical qubits in surface code vs. in bicycle code.

Show answer

Surface code: $100 \cdot 1{,}500 = 150{,}000$ physical qubits for 100 logical qubits at $d = 27$ . Bicycle code: $\lceil 100 / 12 \rceil = 9$ blocks of $[[144, 12, 12]]$ = $9 \cdot 144 = 1{,}296$ physical qubits, but at $d = 12$ rather than $d = 27$ . To compare fairly, you would need a larger bicycle-code instance with $d \approx 27$ , which IBM 2024 does not publish a concrete construction for; published projections suggest $\sim 1{,}500$ physical qubits per ~12 logical qubits at $d \approx 27$ , so $\sim 12{,}500$ physical qubits for 100 logical qubits — vs surface code’s 150,000. About a 12× saving. The exact ratio depends on the projected block size and the surface-code overhead constants.

2. Connectivity check

A processor has degree-4 connectivity (each qubit connected to 4 nearest neighbors). Can it run bicycle codes? What modifications would be needed?

Show answer

Not directly. Bicycle codes need degree-6 (or higher, depending on the specific construction) connectivity. Three options: (1) add tunable couplers to extend each qubit’s effective neighborhood; (2) use 3D integration to connect qubits across chip layers; (3) restrict to a smaller-distance bicycle-code family that fits within degree-4. IBM Nighthawk’s 2026 architecture pursues option (1); Google’s transmon roadmap stays on degree-4 with surface code. The choice is hardware-fab-driven, not software-driven.

3. Decoder latency budget

A bicycle code at $d = 12$ needs to decode syndromes within a single surface-code cycle ( $\sim 1\,\mu$ s on superconducting). BP-OSD on the published bicycle code takes $\sim 100\,\mu$ s on a CPU. Is this a problem? If yes, what fixes it?

Show answer

Yes — the decoder is 100× too slow to keep up with hardware in real time. Without a fix, the cycle time has to slow to match the decoder, eating into the qubit-overhead advantage. Three fixes: (1) faster algorithms (subsequent-rounds incremental decoding, neural-network decoders); (2) specialized hardware (FPGA or ASIC implementations of BP-OSD, which can hit $\mu$ s latency for the published block sizes); (3) larger lookahead so the decoder works on a buffer rather than per-cycle. IBM’s Starling roadmap implies they intend a combination of (2) and (3). The decoder hardware is the unsung infrastructure problem of qLDPC fault tolerance.

4. Cross-architectural comparison

A vendor reports: “Our hardware achieves 10× fewer physical qubits for the same algorithm than IBM Starling.” Without seeing details, what are the three most likely explanations, and which would you want to verify?

Show answer

Three likely explanations: (1) they are using a different qLDPC family with better constants — verify by asking which family and what block size. (2) They are assuming much lower physical error rate ( $p = 10^{-5}$ rather than $10^{-3}$ ), which exponentially reduces required code distance — verify the assumed $p$ and whether it is achievable on their hardware. (3) They are accounting for fewer line items (e.g., reporting algorithm qubits only, omitting factories and routing) — verify by demanding the four-tuple from tutorial 27. Possibility (3) is by far the most common in unreviewed marketing claims; possibility (2) is the most common in academic-paper claims; possibility (1) is genuine architectural advance and should be welcomed but verified against a published construction.

Where this goes next

This concludes the four-tutorial fault-tolerance arc that started with tutorial 19’s surface code. The error-correction track now covers: surface code basics (19), magic-state distillation (24), the Clifford structural reason it is needed (25), the Eastin-Knill no-go that makes it unavoidable (26), resource estimation as a discipline (27), and qLDPC as the surface-code successor (28). Future tutorials in this track will dig into specific decoder constructions (MWPM, BP-OSD, neural decoders), into Floquet codes and dynamical fault tolerance, and into measurement-based fault-tolerance constructions used by photonic platforms. If you want to move horizontally instead, tutorials in the algorithms track (block encoding, QSVT, amplitude estimation) build on the same fault-tolerance machinery from a different angle.

Why surface-code overhead is what it is

The qLDPC definition

The Panteleev-Kalachev breakthrough

IBM bicycle codes: theory becomes practice

The IBM Starling roadmap

The connectivity tradeoff

The decoder tradeoff

The logical-gate tradeoff

Updated resource estimate sketch

Common misconceptions

Decision rule

Common practical workflow

Exercises

1. Rate comparison

2. Connectivity check

3. Decoder latency budget

4. Cross-architectural comparison

Where this goes next

Quantum, for people who already code.