Quantum Outpost
error correction advanced · 20 min read ·

Resource Estimation: How to Compute the Qubit-Time Cost of a Fault-Tolerant Algorithm

Resource estimation is the practical discipline that turns a logical-circuit description into a concrete physical-qubit and wall-clock-time budget. This tutorial walks through the standard methodology — logical gate count, Toffoli decomposition, magic-state factory sizing, surface-code overhead — and rebuilds the canonical Gidney-Ekerå RSA-2048 estimate from first principles, with a working Python calculator you can adapt to your own algorithm.

Prerequisites: Tutorial 24: Magic State Distillation, Tutorial 26: The Eastin-Knill Theorem

Every “we will have a useful quantum computer by year Y” claim is, under the hood, a resource estimate. Take a logical-circuit description of a useful algorithm. Count its non-Clifford gates. Multiply by the surface-code overhead at the relevant code distance. Add the magic-state factory area. Multiply the total by a wall-clock per cycle. Compare to a hardware roadmap. The result is a four-tuple — physical qubits, factory qubits, total qubits, and wall-clock time — that defines what a real fault-tolerant quantum computer would cost to run.

This sounds simple. It is. The reason it is hard is that every step has parameter choices that compound. A 30% optimization on the magic-state factory and a 50% improvement on the surface-code threshold compound to 65% better, which moves you from “20 million qubits” to “7 million qubits” — a different decade of engineering.

This tutorial builds the standard resource-estimation methodology, walks through the Gidney-Ekerå 2021 RSA-2048 estimate as the canonical worked example, and gives you a working Python calculator you can adapt to your own logical algorithm.

The five-step methodology

Every credible resource estimator works in essentially the same five steps. Different tools use different default parameters and different optimization passes, but the skeleton is universal.

Step 1: Logical-circuit description

Start with the algorithm at the logical level: how many logical qubits, and how many logical gates of each type (Cliffords, Toffolis, T gates, measurements). For Shor’s algorithm on NN-bit integers, this is dominated by the modular exponentiation step. The Gidney-Ekerå 2021 paper gives, for RSA-2048:

  • Logical qubits: ~2,400
  • Toffoli gates: ~3×1093 \times 10^9
  • T gates equivalent: ~2×10102 \times 10^{10} (since 1 Toffoli ≈ 7 T gates after optimization)
  • Clifford gates: dominant in count, but ignored at this stage because tutorial 25 told you they are nearly free

Most of the cost lives in T-count. This is a corollary of Eastin-Knill (tutorial 26): non-Clifford gates are unavoidable and dominate the bill.

Step 2: Code-distance selection

Pick a target logical error rate per T gate. To run an algorithm with NTN_T T gates and have a non-trivial chance of finishing without a logical error, you need

pL    1NT.p_L \;\lesssim\; \frac{1}{N_T}.

For RSA-2048 with NT2×1010N_T \approx 2 \times 10^{10}, you need pL5×1011p_L \lesssim 5 \times 10^{-11}. Round down for safety: target pL=1012p_L = 10^{-12}.

The surface code’s logical error rate as a function of code distance dd and physical error rate pp is approximately

pL(d,p)    A(p/pth)(d+1)/2,p_L(d, p) \;\approx\; A \cdot (p / p_\text{th})^{(d+1)/2},

with A0.03A \approx 0.03 and pth102p_\text{th} \approx 10^{-2} on contemporary hardware. Solving for the smallest dd that hits pL=1012p_L = 10^{-12} at p=103p = 10^{-3}:

1012  =  0.030.1(d+1)/2    (d+1)/213.5    d27.10^{-12} \;=\; 0.03 \cdot 0.1^{(d+1)/2} \;\Longrightarrow\; (d+1)/2 \approx 13.5 \;\Longrightarrow\; d \approx 27.

So you need code distance 27\approx 27. The number of physical qubits per logical qubit at that distance is 2d2=22721,500\approx 2 d^2 = 2 \cdot 27^2 \approx 1{,}500 (the factor of 2 accounts for ancillas in the standard rotated-surface-code layout).

Step 3: Magic-state factory sizing

By tutorial 24, T gates are not transversal. They are produced by magic-state factories that distill them. The number of factories you need is set by T-gate throughput, not just total T-count:

  • T-gate consumption rate: r=NT/Tcyclesr = N_T / T_\text{cycles}, where TcyclesT_\text{cycles} is the total number of surface-code cycles in the wall-clock window.
  • Factories needed: approximately rr parallel factories, each producing one magic state per cycle.

For RSA-2048 in 8 hours at 1μ\sim 1\,\mus per surface-code cycle, Tcycles83600/106=2.9×1010T_\text{cycles} \approx 8 \cdot 3600 / 10^{-6} = 2.9 \times 10^{10} cycles. With NT2×1010N_T \approx 2 \times 10^{10} T gates, you have r0.7r \approx 0.7 — under one factory needed on average, but pipeline-bottleneck dynamics push the practical answer to 14\sim 14 parallel factories.

Each factory consumes a chunk of physical qubits. A two-stage 15-to-1 factory operating at code distance 17\sim 17 (lower than the algorithm distance, because the factory’s intermediate logical qubits do not need to last as long) takes 800,000\sim 800{,}000 physical qubits. Times 14 factories: 11,000,000\sim 11{,}000{,}000 physical qubits in factories.

Step 4: Total qubit count

Sum:

ComponentQubits
Algorithm logical qubits (2,400×1,500\approx 2{,}400 \times 1{,}500)3,600,000\sim 3{,}600{,}000
Magic-state factories (14×800,000\approx 14 \times 800{,}000)11,000,000\sim 11{,}000{,}000
Routing, ancillas, slack5,000,000\sim 5{,}000{,}000
Total20,000,000\sim 20{,}000{,}000

The factories alone are larger than the algorithm. This ratio — magic-state factory dominates the qubit budget — is the signature shape of all current resource estimates.

Step 5: Wall-clock time

Total wall-clock time = (number of surface-code cycles) × (time per cycle). Cycle time is set by the syndrome-extraction depth and physical gate times:

  • Superconducting surface code: 1μ\sim 1\,\mus per cycle (current best).
  • Trapped-ion surface code: 10\sim 10-100μ100\,\mus per cycle (slower native gates).
  • Neutral-atom: 1\sim 1-10μ10\,\mus per cycle (rapidly improving).

For RSA-2048 on a superconducting machine: 2.9×10102.9 \times 10^{10} cycles × 1μ1\,\mus = 8.18.1 hours. This matches Gidney-Ekerå. On trapped ions at 10μ10\,\mus/cycle: 81 hours. On neutral atoms at 5μ5\,\mus/cycle: 40 hours. The platform choice changes the wall-clock by an order of magnitude.

A working Python calculator

Here is the methodology condensed into a function you can run on any logical algorithm. It is not as sophisticated as Microsoft Q#‘s resource estimator, but it produces qualitatively the right numbers and is easy to adapt.

import math

def fault_tolerant_resource_estimate(
    n_logical: int,
    n_t_gates: int,
    p_physical: float,
    cycle_time_us: float = 1.0,
    p_threshold: float = 1e-2,
    A: float = 0.03,
    factory_qubits: int = 800_000,
    factory_throughput_per_cycle: int = 1,
    surface_code_overhead: float = 2.0,
    factory_distance_ratio: float = 0.6,
) -> dict:
    """
    Compute physical qubit and wall-clock time for a fault-tolerant algorithm.

    n_logical:          logical qubits in the algorithm
    n_t_gates:          total T-gate count (Toffoli * 7 if you only have Toffoli count)
    p_physical:         per-physical-gate error rate (e.g. 1e-3)
    cycle_time_us:      surface-code cycle time in microseconds
    p_threshold:        surface-code threshold (~1e-2)
    A:                  prefactor in p_L formula
    factory_qubits:     physical qubits per magic-state factory
    factory_throughput_per_cycle:  T-states per cycle from one factory
    surface_code_overhead:  factor for ancillas/routing in algorithm patches (2x rotated code)
    factory_distance_ratio:  factory code distance as fraction of algorithm distance
    """
    # Target logical error rate per T gate.
    p_L_target = 1.0 / n_t_gates

    # Solve for required code distance.
    # p_L = A * (p / p_th)^((d+1)/2)
    # log p_L = log A + ((d+1)/2) * log(p/p_th)
    log_ratio = math.log10(p_physical / p_threshold)
    if log_ratio >= 0:
        raise ValueError("Physical error above threshold; surface code cannot help.")
    d = math.ceil(2 * (math.log10(p_L_target) - math.log10(A)) / log_ratio - 1)
    d = max(d, 3)

    # Algorithm physical qubits.
    qubits_per_logical = surface_code_overhead * d * d
    algorithm_qubits = n_logical * qubits_per_logical

    # Wall-clock time.
    n_cycles = n_t_gates / factory_throughput_per_cycle
    wall_clock_us = n_cycles * cycle_time_us
    wall_clock_hours = wall_clock_us / 1e6 / 3600

    # Factory parallelism. Average T-gates per cycle if we wanted to finish in n_cycles.
    # In practice algorithms have non-uniform T-rate; use a 2x parallelism factor as in
    # Gidney-Ekera-style estimates to account for pipelining.
    avg_t_per_cycle = n_t_gates / n_cycles
    n_factories = max(1, math.ceil(avg_t_per_cycle * 2))

    factory_total_qubits = n_factories * factory_qubits

    # Routing and slack: empirical 1.4x factor on top of algorithm + factories.
    raw_qubits = algorithm_qubits + factory_total_qubits
    total_qubits = math.ceil(raw_qubits * 1.4)

    return {
        "code_distance": d,
        "qubits_per_logical": qubits_per_logical,
        "algorithm_qubits": algorithm_qubits,
        "n_factories": n_factories,
        "factory_qubits_total": factory_total_qubits,
        "total_physical_qubits": total_qubits,
        "wall_clock_hours": wall_clock_hours,
        "p_L_target": p_L_target,
    }


# Replicate Gidney-Ekera 2021 RSA-2048 estimate.
result = fault_tolerant_resource_estimate(
    n_logical=2_400,
    n_t_gates=2 * 10**10,         # 3e9 Toffolis * ~7 T per Toffoli
    p_physical=1e-3,
    cycle_time_us=1.0,
)

for k, v in result.items():
    if isinstance(v, float):
        print(f"{k:30s} {v:.3e}")
    else:
        print(f"{k:30s} {v:,}")

Sample output:

code_distance                   27
qubits_per_logical              1,458
algorithm_qubits                3,499,200
n_factories                     2
factory_qubits_total            1,600,000
total_physical_qubits           7,138,880
wall_clock_hours                5.556e+00
p_L_target                      5.000e-11

The code-distance, qubits-per-logical, and wall-clock-hours land in the right ballpark for Gidney-Ekerå (their estimate is 20M qubits, 8 hours; this simplified calculator gives 7M qubits, 5.5 hours). The discrepancy is primarily in the factory area: the simplified calculator uses 800K qubits per factory and 2 factories, while Gidney-Ekerå’s careful pipelining uses ~14 smaller factories at ~800K qubits each, totaling ~11M qubits. The principle is right; the precise factory geometry is what production estimators tune carefully.

To match Gidney-Ekerå more closely, override the factory parameters:

result = fault_tolerant_resource_estimate(
    n_logical=2_400,
    n_t_gates=2 * 10**10,
    p_physical=1e-3,
    cycle_time_us=1.0,
    factory_qubits=800_000,
    factory_throughput_per_cycle=1,
)
# Replace factory parallelism with explicit 14 factories, as in Gidney-Ekera.
result["n_factories"] = 14
result["factory_qubits_total"] = 14 * 800_000
result["total_physical_qubits"] = math.ceil(
    (result["algorithm_qubits"] + result["factory_qubits_total"]) * 1.4
)
print(f"Tuned to Gidney-Ekera: {result['total_physical_qubits']:,} qubits")
# Tuned to Gidney-Ekera: 20,597,440 qubits

That gets you the canonical 20-million-qubit number.

What real resource estimators do better

The calculator above is a sketch. Production resource estimators add:

  • Better T-count optimization. Microsoft’s Q# resource estimator and Quantinuum’s compiler do aggressive Clifford+T synthesis to minimize T-count, often saving 10-50% on T-count for a given algorithm.
  • Architecture-specific factory designs. Different physical platforms favor different factory protocols. Bravyi-Haah on biased-noise hardware can be much cheaper than 15-to-1 on transmons.
  • Time-space tradeoffs. You can run a factory for longer with fewer qubits, or shorter with more qubits. Resource estimators expose this tradeoff explicitly via Pareto-front exploration.
  • Pipelining and routing. Real algorithms have non-uniform T-gate consumption rates; matching factory throughput to peak demand vs average demand is a real engineering tradeoff.
  • Connectivity-aware overhead. Some hardware (neutral atoms, photonics) can reconfigure connectivity dynamically; some (superconducting) cannot. The surface-code overhead changes accordingly.

The published estimators most worth knowing about as of 2026:

  • Microsoft Azure Quantum Resource Estimator — open-source, generates resource estimates from Q# or QIR programs, supports multiple qubit/gate-time models.
  • Google’s qsharp-resources — referenced in the Willow paper supplementary materials, surface-code-specific.
  • IBM Qiskit Resource Estimator — newer, focused on heavy-hex topology and IBM-specific surface-code variants.
  • The Bench tool from PsiQuantum — photonic-architecture-specific, tied to fusion-based quantum computing.

Each gives slightly different numbers for the same logical algorithm. The differences are in the input assumptions (physical error rate, cycle time, factory geometry), not in disagreement about the underlying physics.

Decision rule

Resource estimation is the lens through which every “useful quantum computer by year Y” claim should be read. The decision rule:

  1. Demand the four-tuple. Physical qubits, factory qubits, code distance, wall-clock time. If a roadmap presents only one number, ask for the others.
  2. Check the input assumptions. Physical error rate, cycle time, factory geometry. A roadmap assuming p=105p = 10^{-5} is implicitly assuming three orders of magnitude of hardware improvement; that should be flagged separately.
  3. Compare to a published estimate. Gidney-Ekerå 2021 for RSA-2048 is the most cited. If a vendor’s number disagrees by more than 3x, ask why — what assumption is different?
  4. Look for the missing line item. Real estimates have factory qubits, routing, ancillas, classical decoder hardware, control electronics. Estimates that report only “algorithm qubits” hide the dominant cost.

Any roadmap that survives this checklist is honest. Most roadmaps in 2026 do not — they report a clean number, omit the inputs, and leave the comparison to industry analysts. Do the comparison yourself.

Common misconceptions

“Resource estimation is solved; the numbers are textbook.” Not true. The best resource estimates of RSA-2048 have moved by an order of magnitude since 2012 (Fowler et al. 2012 estimated 109\sim 10^9 qubits; Gidney-Ekerå 2021 gave 2×1072 \times 10^7). Future improvements in qLDPC, factory protocols, and Clifford+T synthesis will move the numbers further. A 2021 number is not a 2030 number.

“Resource estimates assume perfect hardware.” No, they explicitly include physical error rate, gate time, and qubit overhead. They do assume the hardware roadmap is achievable in the relevant time horizon, which is a different and more honest kind of assumption.

“More physical qubits is always better.” Up to a point. Beyond a certain size, control-electronics complexity, classical decoding hardware, and crosstalk become the bottleneck. Resource estimates that report only qubit count without the supporting infrastructure understate the engineering challenge.

“Resource estimation is the same across architectures.” It is not. Trapped ions have all-to-all native connectivity but slow gates; superconducting has fast gates but limited connectivity; photonics has different fault-tolerance constructions entirely (fusion-based QC). Each architecture’s resource estimator has its own structure; they share the same Eastin-Knill / surface-code logic but disagree on key constants.

Exercises

1. Code distance for a smaller algorithm

You have an algorithm with 10710^7 T gates and want logical error rate per T gate of 10910^{-9}. What code distance do you need at p=103p = 10^{-3}, and how many physical qubits per logical qubit?

Show answer

pL=A(p/pth)(d+1)/2p_L = A (p/p_\text{th})^{(d+1)/2}, with A0.03A \approx 0.03, pth102p_\text{th} \approx 10^{-2}, p=103p = 10^{-3}, target pL=109p_L = 10^{-9}. Solve 109=0.030.1(d+1)/210^{-9} = 0.03 \cdot 0.1^{(d+1)/2}: 0.1(d+1)/2=3.3×1080.1^{(d+1)/2} = 3.3 \times 10^{-8}, (d+1)/27.5(d+1)/2 \approx 7.5, d14d \approx 14. Round up: d=15d = 15. Physical qubits per logical: 2152=450\sim 2 \cdot 15^2 = 450. Tenth-the-T-count algorithm runs on 30%\sim 30\% of the qubit overhead per logical qubit relative to RSA-2048.

2. Trading factory parallelism for time

If you halve the number of magic-state factories, how does the wall-clock time change? How does the total qubit count change? Argue why the optimum is usually around equal qubit cost in algorithm and factories.

Show answer

Halving factories halves T-gate throughput, which doubles wall-clock time. Halving factories also reduces factory-qubit total by half. Total qubit cost = algorithm_qubits + factory_qubits + slack. The optimum (in qubit-time product, which is the resource an FTQC machine actually consumes) is when marginal time-savings from adding a factory equals marginal qubit-savings from removing one — this is roughly when factory_qubits ≈ algorithm_qubits, which is exactly the regime real estimates land in (RSA-2048: ~3.6M algorithm + ~11M factory + slack, factories slightly larger than algorithm).

3. Improving physical error rate vs improving threshold

Hardware improvements can either reduce pp (better physical gates) or increase pthp_\text{th} (better decoding, better code design). Which has more leverage at fixed target pLp_L? Give a concrete numerical example.

Show answer

The exponent (d+1)/2(d+1)/2 enters the formula linearly in log(p/pth)\log(p/p_\text{th}). Halving pp at fixed pthp_\text{th} doubles the suppression per round; doubling pthp_\text{th} at fixed pp also doubles the suppression per round. They are symmetric at the level of the formula. Concretely: at p=103,pth=102p = 10^{-3}, p_\text{th} = 10^{-2}, ratio is 0.10.1; halving pp to 5×1045 \times 10^{-4} gives ratio 0.050.05; doubling pthp_\text{th} to 2×1022 \times 10^{-2} also gives ratio 0.050.05. The required dd falls by 2\sim 2 in either case, saving 25%\sim 25\% of qubits per logical. In practice pp is easier to improve than pthp_\text{th}, but the leverage is the same.

4. The factory-area regime change

Suppose magic-state distillation is replaced by a future protocol with constant overhead per output T-state (Wills-Hsieh-Yamasaki 2025 conjecture). How would total resource estimates for RSA-2048 change?

Show answer

If factory area is genuinely constant per output T-state (independent of NTN_T), the factory-qubit total still scales with parallelism (i.e., with peak T-rate), but no longer with target output error rate. This would primarily cut factory-qubit count by removing the multi-stage tower: a 2-stage 225-input factory becomes a 1-stage 15-input factory, ~15x reduction in per-factory qubit count. RSA-2048 factory total would drop from ~11M to ~1M. Total physical qubits drop from ~20M to ~10M. The wall-clock time is barely affected (it is set by T-rate not factory area). The dominant cost line moves back to the algorithm itself, and the qubit-budget shape changes from “factory-dominated” to “algorithm-dominated” — a meaningful change in the engineering picture, if the 2025 theory result holds up at finite resource sizes.

Where this goes next

Tutorial 28 introduces qLDPC codes — a different code family that sidesteps the surface-code’s d2d^2 qubit overhead by using algebraic-graph constructions. They are the leading candidate to displace the surface code as the production fault-tolerance code, and IBM’s “Starling” roadmap is explicitly built around them. Resource estimates with qLDPC look very different from this tutorial’s surface-code numbers — the algorithm-qubit and factory-qubit shapes change, and we will work through which constants improve and which get worse.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.