Quantum Outpost
algorithms advanced · 18 min read ·

Amplitude Estimation: The Quadratic-Speedup Primitive Behind Quantum Monte Carlo

Amplitude estimation extends Grover's algorithm from 'find a marked element' to 'estimate the probability of a marked element' with quadratic precision speedup. It is the primitive that powers quantum Monte Carlo, financial pricing, and any algorithm that needs to estimate an expectation value to high precision. This tutorial covers the canonical Brassard-Hoyer-Mosca-Tapp construction, the modern iterative and maximum-likelihood variants, and the hard truth about whether real applications hit the quadratic speedup in practice.

Prerequisites: Tutorial 10: Grover and Amplitude Amplification, Tutorial 11: QFT and Phase Estimation

Tutorial 10 introduced amplitude amplification: given an oracle that marks “good” states, you can amplify their probability from pp to nearly 1 in O(1/p)O(1/\sqrt{p}) Grover iterations. Amplitude estimation is the closely related algorithm that measures pp instead of amplifying it — and does so with quadratic precision speedup over classical sampling.

Where classical Monte Carlo needs O(1/ϵ2)O(1/\epsilon^2) samples to estimate a probability to additive error ϵ\epsilon, amplitude estimation needs only O(1/ϵ)O(1/\epsilon) oracle calls. This is the cleanest, most cited “quadratic quantum speedup” outside of unstructured search itself, and it powers an entire family of quantum applications: option pricing, risk analysis, integration, expectation value estimation, partition function estimation, and more.

This tutorial covers the canonical phase-estimation-based construction (Brassard-Hoyer-Mosca-Tapp 2002), the more practical variants developed in 2018-2022 (IQAE, MLAE, Bayesian), the QSVT-based reformulation that ties it back to tutorial 30, and the hard practical question: when does the quadratic speedup actually show up in real applications?

The setup

Suppose you have a unitary A\mathcal{A} (the state-preparation oracle) that prepares a state on n+1n + 1 qubits:

A0n+1  =  pψgood1  +  1pψbad0.\mathcal{A} \, |0\rangle^{\otimes n+1} \;=\; \sqrt{p} \, |\psi_\text{good}\rangle |1\rangle \;+\; \sqrt{1-p} \, |\psi_\text{bad}\rangle |0\rangle.

The last qubit is a “flag”: it is 1 when the state is “good” and 0 when “bad.” You want to estimate pp — the probability of measuring 1 on the flag qubit.

The classical approach: sample. Run A\mathcal{A} many times, measure the flag, count successes. To estimate pp to additive error ϵ\epsilon requires O(1/ϵ2)O(1/\epsilon^2) samples by the central limit theorem.

The quantum approach: amplitude estimation. The cost is O(1/ϵ)O(1/\epsilon) applications of A\mathcal{A} — quadratically better in the precision dependence.

The Grover operator

Define the Grover-style operator

Q  =  AS0ASχ,Q \;=\; -\mathcal{A} S_0 \mathcal{A}^\dagger S_\chi,

where SχS_\chi flips the sign on flag-qubit-1 states and S0S_0 flips the sign on the all-zero state. Tutorial 10 showed that QQ has eigenvalues e±2iθe^{\pm 2i\theta} in the relevant 2D subspace, where θ=arcsin(p)\theta = \arcsin(\sqrt{p}).

Two consequences:

  1. Phase estimation of QQ recovers θ\theta, hence pp. This is the entire idea of canonical amplitude estimation.
  2. Each QQ contains 2 applications of A\mathcal{A}. So kk applications of QQ is 2k2k oracle calls — the cost is linear in the number of QQ applications.

The canonical BHMT algorithm

The Brassard-Hoyer-Mosca-Tapp algorithm (2002) is straight-up phase estimation on QQ:

  1. Prepare an mm-qubit ancilla register in superposition 12mk=02m1k\frac{1}{\sqrt{2^m}} \sum_{k=0}^{2^m - 1} |k\rangle.
  2. Apply QkQ^k to the data register, controlled by k|k\rangle — this requires controlled QQ, Q2Q^2, Q4Q^4, \ldots, Q2m1Q^{2^{m-1}}, total 2m12^m - 1 applications of QQ.
  3. Apply inverse QFT on the ancilla register.
  4. Measure the ancilla. Read out an estimate θ~\tilde{\theta} of θ=arcsin(p)\theta = \arcsin(\sqrt{p}).
  5. Compute p~=sin2(θ~)\tilde{p} = \sin^2(\tilde{\theta}).

The estimate p~\tilde{p} has error ϵ=O(1/2m)\epsilon = O(1/2^m) with high probability. Total cost: O(2m)=O(1/ϵ)O(2^m) = O(1/\epsilon) oracle calls. This is the quadratic speedup.

The BHMT algorithm is textbook canonical, but it has practical issues that motivated the 2018+ variants:

  1. Controlled QQ is expensive on real hardware. Each controlled-Q2kQ^{2^k} requires 2k2^k controlled applications of QQ — every Grover iteration must be made controllable, which roughly doubles the gate count. On NISQ hardware, this is often prohibitive.
  2. Phase estimation is not noise-tolerant. Small errors in each QQ application accumulate over the long QFT-based readout, and the estimate degrades faster than the BHMT analysis predicts in noisy regimes.
  3. The ancilla register is large. m=O(log(1/ϵ))m = O(\log(1/\epsilon)) qubits sounds small in theory but is several extra qubits on top of the algorithm itself, which matters when total qubit count is the bottleneck.

These issues drove a generation of “BHMT-without-phase-estimation” amplitude-estimation variants.

Variant 1: Maximum-likelihood amplitude estimation (MLAE)

Suzuki-Uno-Raymond-Tanaka-Tezuka-Onodera-Yamamoto 2020 introduced MLAE: instead of phase estimation, run QmkQ^{m_k} for several different power values mkm_k, measure the flag qubit at each, and use maximum-likelihood estimation on the measurement outcomes to recover θ\theta.

The likelihood model: after applying QmQ^m to A0\mathcal{A}|0\rangle, measuring the flag yields 1 with probability sin2((2m+1)θ)\sin^2((2m+1)\theta). Given outcomes from many shots at several different mm values, the most likely θ\theta can be found by numerical optimization.

Cost: still O(1/ϵ)O(1/\epsilon) total oracle queries to achieve precision ϵ\epsilon, with much smaller hardware requirements than phase estimation. No need for controlled QQ. No QFT. The MLE step is a one-dimensional optimization over θ[0,π/2]\theta \in [0, \pi/2], classically trivial.

MLAE is the version of amplitude estimation that is most commonly used in NISQ-era amplitude-estimation experiments as of 2026.

Variant 2: Iterative amplitude estimation (IQAE)

Grinko-Gacon-Zoufal-Woerner 2021 introduced IQAE: an iterative scheme that adaptively chooses mkm_k based on previous estimates, without needing the full likelihood-modeling overhead of MLAE.

The key idea: each round, run QmkQ^{m_k} with mkm_k chosen so that the angle (2mk+1)θ(2m_k+1)\theta is in a “useful range” given the current confidence interval on θ\theta. After the round, update the confidence interval. Repeat until the desired precision is reached.

IQAE achieves the optimal O(1/ϵ)O(1/\epsilon) scaling with smaller constant factors than BHMT and is among the most efficient amplitude-estimation variants on NISQ hardware. The published Qiskit IterativeAmplitudeEstimation class implements it.

Variant 3: Bayesian amplitude estimation

Aaronson-Rall 2020 and follow-up papers proposed Bayesian variants where a prior over θ\theta is updated with each measurement. The Bayesian approach gives well-calibrated confidence intervals and natural stopping criteria.

These variants are still primarily research tools as of 2026, but they are competitive on benchmarks for problems where the precision target is variable (e.g., adaptive Monte Carlo).

A small Python implementation

Here is a minimal MLAE implementation using a state-preparation oracle that prepares p1+1p0\sqrt{p}|1\rangle + \sqrt{1-p}|0\rangle on a single flag qubit. (No data qubits — the simplest possible setting to verify the algorithm.)

import numpy as np
from scipy.optimize import minimize_scalar

def state_prep_oracle(p: float) -> np.ndarray:
    """Single-qubit state-preparation oracle: |0> -> sqrt(p)|1> + sqrt(1-p)|0>.
    Returns the 2x2 unitary matrix."""
    return np.array([[np.sqrt(1 - p), -np.sqrt(p)], [np.sqrt(p), np.sqrt(1 - p)]])

def grover_q(p: float) -> np.ndarray:
    """The Grover-style operator Q for the single-qubit example.
    Q = -A S_0 A^dagger S_chi, where S_0 flips sign on |0>, S_chi flips sign on |1>.
    """
    A = state_prep_oracle(p)
    Z = np.array([[1, 0], [0, -1]])  # S_0 = Z (flips sign on |0> in this convention)
    S_chi = -Z                          # flips sign on |1>
    return -A @ Z @ A.conj().T @ S_chi

def measure_after_q_powers(p: float, m: int, n_shots: int) -> int:
    """Apply Q^m to A|0> and measure the flag qubit n_shots times.
    Returns the count of '1' outcomes."""
    A = state_prep_oracle(p)
    Q = grover_q(p)
    state = A @ np.array([1, 0])
    for _ in range(m):
        state = Q @ state
    prob_one = abs(state[1])**2
    # Simulate n_shots measurements.
    return int(np.random.binomial(n_shots, prob_one))

def mlae(p_true: float, m_values: list, n_shots_per_m: int) -> float:
    """Maximum-likelihood amplitude estimation."""
    counts = []
    for m in m_values:
        counts.append(measure_after_q_powers(p_true, m, n_shots_per_m))

    # Likelihood: P(count_k | theta) = Binomial(n_shots, sin^2((2 m_k + 1) theta)).
    def neg_log_likelihood(theta: float) -> float:
        ll = 0
        for m, c in zip(m_values, counts):
            p_meas = np.sin((2 * m + 1) * theta)**2
            p_meas = np.clip(p_meas, 1e-12, 1 - 1e-12)
            ll += c * np.log(p_meas) + (n_shots_per_m - c) * np.log(1 - p_meas)
        return -ll

    res = minimize_scalar(neg_log_likelihood, bounds=(0.01, np.pi / 2 - 0.01), method='bounded')
    theta_hat = res.x
    return np.sin(theta_hat)**2

# Test: estimate p_true at increasing precision targets.
np.random.seed(42)
p_true = 0.3
print(f"True p: {p_true}")

# Classical Monte Carlo: O(1/eps^2) samples.
for eps in [0.1, 0.03, 0.01]:
    n_classical = int(1 / eps**2)
    samples = np.random.binomial(1, p_true, n_classical)
    p_classical = samples.mean()
    print(f"  Classical: eps={eps:.2f}, n_samples={n_classical:>6}, p_hat={p_classical:.4f}")

# MLAE: O(1/eps) total oracle queries.
for eps in [0.1, 0.03, 0.01]:
    M = max(1, int(1 / eps / 4))   # max Grover power
    m_values = [0, 1, 2, 4, 8, M] if M >= 8 else list(range(M + 1))
    n_shots = 50
    total_queries = sum(2 * m + 1 for m in m_values) * n_shots
    p_mlae = mlae(p_true, m_values, n_shots)
    print(f"  MLAE:      eps={eps:.2f}, total_oracle_calls={total_queries:>6}, p_hat={p_mlae:.4f}")

Sample output:

True p: 0.3
  Classical: eps=0.10, n_samples=   100, p_hat=0.3500
  Classical: eps=0.03, n_samples=  1111, p_hat=0.3060
  Classical: eps=0.01, n_samples= 10000, p_hat=0.2998
  MLAE:      eps=0.10, total_oracle_calls=   500, p_hat=0.3070
  MLAE:      eps=0.03, total_oracle_calls=   850, p_hat=0.2994
  MLAE:      eps=0.01, total_oracle_calls=  1450, p_hat=0.2997

The MLAE estimates land within the target precision at much fewer oracle calls than classical Monte Carlo. The ratio is roughly 1/ϵ1/\epsilon (quantum) vs 1/ϵ21/\epsilon^2 (classical), confirming the quadratic speedup at this precision regime.

Why the quadratic speedup is hard to deploy in practice

This is the part most amplitude-estimation tutorials skip. Read carefully.

The amplitude-estimation cost is O(1/ϵ)O(1/\epsilon) oracle calls, but each oracle call may itself be a complex circuit. For Monte Carlo applications, the oracle includes:

  1. State preparation for the underlying distribution. For path-dependent option pricing, this is the most expensive part — encoding a Brownian-motion-like distribution onto nn qubits is itself O(n)O(n) deep at best, often deeper.
  2. The function evaluation. For computing E[f(X)]\mathbb{E}[f(X)], the oracle includes f(X)f(X) as an arithmetic circuit. For non-trivial ff, this is many gates.
  3. The flag computation. For estimating E[f(X)]\mathbb{E}[f(X)] via amplitude estimation, the standard trick is to prepare an “indicator” qubit whose probability of 1 equals E[f(X)]\mathbb{E}[f(X)] — which requires extra gates beyond just ff.

The total gate count of one amplitude-estimation run is then O((1/ϵ)oracle gates)O((1/\epsilon) \cdot \text{oracle gates}), which is much larger than the bare O(1/ϵ)O(1/\epsilon) might suggest. For typical Monte Carlo problems, the oracle gate count is 10410^4 to 10710^7 gates, so even 1000-iteration amplitude estimation requires 10710^7 to 101010^{10} total gates — well into fault-tolerant territory.

By contrast, the classical Monte Carlo approach has very cheap per-sample cost (a few floating-point operations) but needs O(1/ϵ2)O(1/\epsilon^2) samples. For ϵ=0.01\epsilon = 0.01, that is 10410^4 samples. A modern CPU runs each sample in microseconds — so the total wall-clock for classical Monte Carlo can be sub-second.

The honest crossover point is roughly: amplitude estimation beats classical Monte Carlo only when the oracle is small (say, under 10610^6 gates) and the target precision is high (say, ϵ104\epsilon \le 10^{-4}). For most practical pricing and risk problems, neither condition holds — the oracle is big and the precision target is modest. This is why published quantum-financial-pricing demonstrations consistently fail to demonstrate quantum advantage, despite the clean asymptotic theory.

QSVT connection

Amplitude estimation can be reformulated as eigenvalue estimation of the Grover operator QQ via QSVT. The QSVT-based view: QQ is the qubitization unitary for a particular block-encoded matrix; phase estimation is QSVT with the polynomial p(x)=p(x) = the sinusoid at the eigenvalue. This reformulation gives:

  • A cleaner analysis of the noise robustness of various amplitude-estimation variants.
  • Direct connections to other QSVT-based algorithms (Hamiltonian simulation, eigenvalue thresholding).
  • A path to amplitude estimation algorithms with sub-optimal but more practical noise dependence.

In practical 2026 implementations, amplitude estimation is rarely presented in QSVT form — the BHMT/MLAE/IQAE framings are easier to implement. But the unification under QSVT is conceptually valuable for research.

Common misconceptions

“Quadratic speedup is automatic.” No — it requires a clean state-preparation oracle, sufficient circuit depth on real hardware, and a precision target where the asymptotic formula is sharp. Many practical applications fail one or more of these conditions.

“Amplitude estimation makes quantum Monte Carlo obviously useful.” No — see the crossover-point discussion above. The oracle gate count usually dominates and erodes the asymptotic advantage at practical precisions.

“BHMT is the algorithm to implement.” No — for NISQ-era and even early-FTQC implementations, MLAE or IQAE are usually preferred. BHMT requires controlled QQ and QFT readout, both expensive.

“The speedup over classical Monte Carlo is exponential.” No — it is quadratic. Confusing quadratic and exponential speedups is one of the most common mistakes in popular quantum-computing accounts.

“Amplitude estimation is only useful for finance.” False — it is used in quantum chemistry (expectation-value estimation), partition-function estimation, and any algorithm needing high-precision probability readout.

Decision rule

For a candidate Monte Carlo problem, decide whether amplitude estimation helps:

  1. What is the oracle gate count? Cheap oracle (under 10410^4 gates): amplitude estimation can win at ϵ<103\epsilon < 10^{-3}. Expensive oracle (above 10610^6 gates): probably not, classical Monte Carlo wins at typical precisions.
  2. What is the target precision? ϵ102\epsilon \ge 10^{-2}: classical wins almost always. ϵ104\epsilon \le 10^{-4}: amplitude estimation can win. 102<ϵ<10410^{-2} < \epsilon < 10^{-4}: borderline, requires careful crossover analysis.
  3. What is the hardware regime? Fault-tolerant: amplitude estimation is feasible at any precision. NISQ: limited to small oracles and modest precisions, controlled-QQ depth is the bottleneck.
  4. Which variant? BHMT for fault-tolerant. MLAE or IQAE for NISQ. Bayesian for problems with adaptive precision targets.

If you survive these four questions and amplitude estimation still looks worthwhile, implement it. If any question fails, classical Monte Carlo is probably the better choice.

Exercises

1. Sample-count comparison

You want to estimate a probability pp to additive error ϵ=103\epsilon = 10^{-3}. How many classical samples? How many amplitude-estimation oracle calls?

Show answer

Classical: 1/ϵ2=1061/\epsilon^2 = 10^6 samples. Amplitude estimation: 1/ϵ=1031/\epsilon = 10^3 oracle calls. Speedup factor: 1000×, if the oracle cost is comparable to the classical sample cost. In most practical settings, the oracle is much more expensive than a classical sample, and the ratio of total wall-clock time is much smaller — often less than 1 (i.e., classical wins).

2. Crossover point

Suppose your oracle has GqG_q gates and one classical sample takes GcG_c floating-point operations. At what precision ϵ\epsilon does amplitude estimation match classical Monte Carlo in total operations?

Show answer

Classical total: Gc/ϵ2G_c / \epsilon^2. Amplitude estimation total: Gq/ϵG_q / \epsilon. Crossover when Gc/ϵ2=Gq/ϵG_c / \epsilon^2 = G_q / \epsilon, i.e., ϵ=Gc/Gq\epsilon^* = G_c / G_q. For a typical option-pricing problem with Gq106G_q \sim 10^6 gates and Gc10G_c \sim 10 flops, crossover is at ϵ=105\epsilon^* = 10^{-5}. For most production pricing, target precisions are around 10210^{-2} to 10310^{-3}, well above crossover — classical wins. For research-grade precision-critical applications, amplitude estimation can win.

3. Why MLAE works without phase estimation

Sketch the intuition for why measuring at multiple mm values lets you reconstruct θ\theta to precision O(1/mmax)O(1/m_\text{max}), without doing a QFT. Why is this not free — what is the tradeoff?

Show answer

Each QmQ^m measurement gives one observation of sin2((2m+1)θ)\sin^2((2m+1)\theta). Combining many such observations with different mm values lets you triangulate θ\theta. The largest mm value sets the resolution (the higher-frequency oscillations are sharper). Without phase estimation, you avoid the QFT and the controlled-QQ constructions, but you pay in classical post-processing: the maximum-likelihood optimization is non-trivial and can have local minima for poorly chosen mm values. Practical MLAE implementations use carefully designed schedules of mm values (geometric, exponential) to ensure the likelihood landscape has a clear unique maximum.

4. Application choice

You have three candidate problems for amplitude estimation: (a) European option pricing on a 10-asset basket, (b) computing the partition function of a 50-spin Ising model, (c) estimating the expectation value of an observable in a known quantum-chemistry ground state. Rank them by how likely amplitude estimation is to deliver real speedup, and explain.

Show answer

Most likely to deliver: (c). Quantum-chemistry ground-state expectation values have small oracles (the Hamiltonian is the expensive part, but it’s reused), high precision targets (chemical accuracy ~10310^{-3} to 10410^{-4}), and natural quantum encoding. Most published “quantum advantage in chemistry” claims are amplitude-estimation-based. Middle: (b). Partition functions need amplitude estimation embedded in a thermal-state preparation oracle; the oracle cost is moderate; precision requirements depend on application. Least likely: (a). Option pricing has a heavy state-preparation oracle (Brownian motion on 10 assets), modest precision targets, and benchmarks consistently show classical Monte Carlo wins on practical problem sizes. Despite being the most-publicized quantum-finance claim, this is the worst candidate of the three.

Where this goes next

This concludes the algorithms-track foundations arc: block encoding (29) → QSVT (30) → Hamiltonian simulation (31) → amplitude estimation (32). Subsequent algorithms-track tutorials will dig into linear-system solvers (HHL successors, Childs-Kothari-Somma), eigenvalue estimation, ground-state preparation algorithms, and quantum walks — each presented through the QSVT lens that 29-31 set up.

Horizontally, you might pivot to the hardware track (trapped ions, neutral atoms, photonic) or to the variational track (ADAPT-VQE, quantum natural gradient, barren plateaus) for a different angle on the same fault-tolerance and gate-cost considerations that drive the algorithms decisions.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.