algorithms advanced · 18 min read · April 29, 2026

Amplitude Estimation: The Quadratic-Speedup Primitive Behind Quantum Monte Carlo

Amplitude estimation extends Grover's algorithm from 'find a marked element' to 'estimate the probability of a marked element' with quadratic precision speedup. It is the primitive that powers quantum Monte Carlo, financial pricing, and any algorithm that needs to estimate an expectation value to high precision. This tutorial covers the canonical Brassard-Hoyer-Mosca-Tapp construction, the modern iterative and maximum-likelihood variants, and the hard truth about whether real applications hit the quadratic speedup in practice.

Prerequisites: Tutorial 10: Grover and Amplitude Amplification, Tutorial 11: QFT and Phase Estimation

Tutorial 10 introduced amplitude amplification: given an oracle that marks “good” states, you can amplify their probability from $p$ to nearly 1 in $O(1/\sqrt{p})$ Grover iterations. Amplitude estimation is the closely related algorithm that measures $p$ instead of amplifying it — and does so with quadratic precision speedup over classical sampling.

Where classical Monte Carlo needs $O(1/\epsilon^2)$ samples to estimate a probability to additive error $\epsilon$ , amplitude estimation needs only $O(1/\epsilon)$ oracle calls. This is the cleanest, most cited “quadratic quantum speedup” outside of unstructured search itself, and it powers an entire family of quantum applications: option pricing, risk analysis, integration, expectation value estimation, partition function estimation, and more.

This tutorial covers the canonical phase-estimation-based construction (Brassard-Hoyer-Mosca-Tapp 2002), the more practical variants developed in 2018-2022 (IQAE, MLAE, Bayesian), the QSVT-based reformulation that ties it back to tutorial 30, and the hard practical question: when does the quadratic speedup actually show up in real applications?

The setup

Suppose you have a unitary $\mathcal{A}$ (the state-preparation oracle) that prepares a state on $n + 1$ qubits:

\mathcal{A} \, |0\rangle^{\otimes n+1} \;=\; \sqrt{p} \, |\psi_\text{good}\rangle |1\rangle \;+\; \sqrt{1-p} \, |\psi_\text{bad}\rangle |0\rangle.

The last qubit is a “flag”: it is 1 when the state is “good” and 0 when “bad.” You want to estimate $p$ — the probability of measuring 1 on the flag qubit.

The classical approach: sample. Run $\mathcal{A}$ many times, measure the flag, count successes. To estimate $p$ to additive error $\epsilon$ requires $O(1/\epsilon^2)$ samples by the central limit theorem.

The quantum approach: amplitude estimation. The cost is $O(1/\epsilon)$ applications of $\mathcal{A}$ — quadratically better in the precision dependence.

The Grover operator

Define the Grover-style operator

Q \;=\; -\mathcal{A} S_0 \mathcal{A}^\dagger S_\chi,

where $S_\chi$ flips the sign on flag-qubit-1 states and $S_0$ flips the sign on the all-zero state. Tutorial 10 showed that $Q$ has eigenvalues $e^{\pm 2i\theta}$ in the relevant 2D subspace, where $\theta = \arcsin(\sqrt{p})$ .

Two consequences:

Phase estimation of $Q$ recovers $\theta$ , hence $p$ . This is the entire idea of canonical amplitude estimation.
Each $Q$ contains 2 applications of $\mathcal{A}$ . So $k$ applications of $Q$ is $2k$ oracle calls — the cost is linear in the number of $Q$ applications.

The canonical BHMT algorithm

The Brassard-Hoyer-Mosca-Tapp algorithm (2002) is straight-up phase estimation on $Q$ :

Prepare an $m$ -qubit ancilla register in superposition $\frac{1}{\sqrt{2^m}} \sum_{k=0}^{2^m - 1} |k\rangle$ .
Apply $Q^k$ to the data register, controlled by $|k\rangle$ — this requires controlled $Q$ , $Q^2$ , $Q^4$ , $\ldots$ , $Q^{2^{m-1}}$ , total $2^m - 1$ applications of $Q$ .
Apply inverse QFT on the ancilla register.
Measure the ancilla. Read out an estimate $\tilde{\theta}$ of $\theta = \arcsin(\sqrt{p})$ .
Compute $\tilde{p} = \sin^2(\tilde{\theta})$ .

The estimate $\tilde{p}$ has error $\epsilon = O(1/2^m)$ with high probability. Total cost: $O(2^m) = O(1/\epsilon)$ oracle calls. This is the quadratic speedup.

The BHMT algorithm is textbook canonical, but it has practical issues that motivated the 2018+ variants:

Controlled $Q$ is expensive on real hardware. Each controlled- $Q^{2^k}$ requires $2^k$ controlled applications of $Q$ — every Grover iteration must be made controllable, which roughly doubles the gate count. On NISQ hardware, this is often prohibitive.
Phase estimation is not noise-tolerant. Small errors in each $Q$ application accumulate over the long QFT-based readout, and the estimate degrades faster than the BHMT analysis predicts in noisy regimes.
The ancilla register is large. $m = O(\log(1/\epsilon))$ qubits sounds small in theory but is several extra qubits on top of the algorithm itself, which matters when total qubit count is the bottleneck.

These issues drove a generation of “BHMT-without-phase-estimation” amplitude-estimation variants.

Variant 1: Maximum-likelihood amplitude estimation (MLAE)

Suzuki-Uno-Raymond-Tanaka-Tezuka-Onodera-Yamamoto 2020 introduced MLAE: instead of phase estimation, run $Q^{m_k}$ for several different power values $m_k$ , measure the flag qubit at each, and use maximum-likelihood estimation on the measurement outcomes to recover $\theta$ .

The likelihood model: after applying $Q^m$ to $\mathcal{A}|0\rangle$ , measuring the flag yields 1 with probability $\sin^2((2m+1)\theta)$ . Given outcomes from many shots at several different $m$ values, the most likely $\theta$ can be found by numerical optimization.

Cost: still $O(1/\epsilon)$ total oracle queries to achieve precision $\epsilon$ , with much smaller hardware requirements than phase estimation. No need for controlled $Q$ . No QFT. The MLE step is a one-dimensional optimization over $\theta \in [0, \pi/2]$ , classically trivial.

MLAE is the version of amplitude estimation that is most commonly used in NISQ-era amplitude-estimation experiments as of 2026.

Variant 2: Iterative amplitude estimation (IQAE)

Grinko-Gacon-Zoufal-Woerner 2021 introduced IQAE: an iterative scheme that adaptively chooses $m_k$ based on previous estimates, without needing the full likelihood-modeling overhead of MLAE.

The key idea: each round, run $Q^{m_k}$ with $m_k$ chosen so that the angle $(2m_k+1)\theta$ is in a “useful range” given the current confidence interval on $\theta$ . After the round, update the confidence interval. Repeat until the desired precision is reached.

IQAE achieves the optimal $O(1/\epsilon)$ scaling with smaller constant factors than BHMT and is among the most efficient amplitude-estimation variants on NISQ hardware. The published Qiskit IterativeAmplitudeEstimation class implements it.

Variant 3: Bayesian amplitude estimation

Aaronson-Rall 2020 and follow-up papers proposed Bayesian variants where a prior over $\theta$ is updated with each measurement. The Bayesian approach gives well-calibrated confidence intervals and natural stopping criteria.

These variants are still primarily research tools as of 2026, but they are competitive on benchmarks for problems where the precision target is variable (e.g., adaptive Monte Carlo).

A small Python implementation

Here is a minimal MLAE implementation using a state-preparation oracle that prepares $\sqrt{p}|1\rangle + \sqrt{1-p}|0\rangle$ on a single flag qubit. (No data qubits — the simplest possible setting to verify the algorithm.)

import numpy as np
from scipy.optimize import minimize_scalar

def state_prep_oracle(p: float) -> np.ndarray:
    """Single-qubit state-preparation oracle: |0> -> sqrt(p)|1> + sqrt(1-p)|0>.
    Returns the 2x2 unitary matrix."""
    return np.array([[np.sqrt(1 - p), -np.sqrt(p)], [np.sqrt(p), np.sqrt(1 - p)]])

def grover_q(p: float) -> np.ndarray:
    """The Grover-style operator Q for the single-qubit example.
    Q = -A S_0 A^dagger S_chi, where S_0 flips sign on |0>, S_chi flips sign on |1>.
    """
    A = state_prep_oracle(p)
    Z = np.array([[1, 0], [0, -1]])  # S_0 = Z (flips sign on |0> in this convention)
    S_chi = -Z                          # flips sign on |1>
    return -A @ Z @ A.conj().T @ S_chi

def measure_after_q_powers(p: float, m: int, n_shots: int) -> int:
    """Apply Q^m to A|0> and measure the flag qubit n_shots times.
    Returns the count of '1' outcomes."""
    A = state_prep_oracle(p)
    Q = grover_q(p)
    state = A @ np.array([1, 0])
    for _ in range(m):
        state = Q @ state
    prob_one = abs(state[1])**2
    # Simulate n_shots measurements.
    return int(np.random.binomial(n_shots, prob_one))

def mlae(p_true: float, m_values: list, n_shots_per_m: int) -> float:
    """Maximum-likelihood amplitude estimation."""
    counts = []
    for m in m_values:
        counts.append(measure_after_q_powers(p_true, m, n_shots_per_m))

    # Likelihood: P(count_k | theta) = Binomial(n_shots, sin^2((2 m_k + 1) theta)).
    def neg_log_likelihood(theta: float) -> float:
        ll = 0
        for m, c in zip(m_values, counts):
            p_meas = np.sin((2 * m + 1) * theta)**2
            p_meas = np.clip(p_meas, 1e-12, 1 - 1e-12)
            ll += c * np.log(p_meas) + (n_shots_per_m - c) * np.log(1 - p_meas)
        return -ll

    res = minimize_scalar(neg_log_likelihood, bounds=(0.01, np.pi / 2 - 0.01), method='bounded')
    theta_hat = res.x
    return np.sin(theta_hat)**2

# Test: estimate p_true at increasing precision targets.
np.random.seed(42)
p_true = 0.3
print(f"True p: {p_true}")

# Classical Monte Carlo: O(1/eps^2) samples.
for eps in [0.1, 0.03, 0.01]:
    n_classical = int(1 / eps**2)
    samples = np.random.binomial(1, p_true, n_classical)
    p_classical = samples.mean()
    print(f"  Classical: eps={eps:.2f}, n_samples={n_classical:>6}, p_hat={p_classical:.4f}")

# MLAE: O(1/eps) total oracle queries.
for eps in [0.1, 0.03, 0.01]:
    M = max(1, int(1 / eps / 4))   # max Grover power
    m_values = [0, 1, 2, 4, 8, M] if M >= 8 else list(range(M + 1))
    n_shots = 50
    total_queries = sum(2 * m + 1 for m in m_values) * n_shots
    p_mlae = mlae(p_true, m_values, n_shots)
    print(f"  MLAE:      eps={eps:.2f}, total_oracle_calls={total_queries:>6}, p_hat={p_mlae:.4f}")

Sample output:

True p: 0.3
  Classical: eps=0.10, n_samples=   100, p_hat=0.3500
  Classical: eps=0.03, n_samples=  1111, p_hat=0.3060
  Classical: eps=0.01, n_samples= 10000, p_hat=0.2998
  MLAE:      eps=0.10, total_oracle_calls=   500, p_hat=0.3070
  MLAE:      eps=0.03, total_oracle_calls=   850, p_hat=0.2994
  MLAE:      eps=0.01, total_oracle_calls=  1450, p_hat=0.2997

The MLAE estimates land within the target precision at much fewer oracle calls than classical Monte Carlo. The ratio is roughly $1/\epsilon$ (quantum) vs $1/\epsilon^2$ (classical), confirming the quadratic speedup at this precision regime.

Why the quadratic speedup is hard to deploy in practice

This is the part most amplitude-estimation tutorials skip. Read carefully.

The amplitude-estimation cost is $O(1/\epsilon)$ oracle calls, but each oracle call may itself be a complex circuit. For Monte Carlo applications, the oracle includes:

State preparation for the underlying distribution. For path-dependent option pricing, this is the most expensive part — encoding a Brownian-motion-like distribution onto $n$ qubits is itself $O(n)$ deep at best, often deeper.
The function evaluation. For computing $\mathbb{E}[f(X)]$ , the oracle includes $f(X)$ as an arithmetic circuit. For non-trivial $f$ , this is many gates.
The flag computation. For estimating $\mathbb{E}[f(X)]$ via amplitude estimation, the standard trick is to prepare an “indicator” qubit whose probability of 1 equals $\mathbb{E}[f(X)]$ — which requires extra gates beyond just $f$ .

The total gate count of one amplitude-estimation run is then $O((1/\epsilon) \cdot \text{oracle gates})$ , which is much larger than the bare $O(1/\epsilon)$ might suggest. For typical Monte Carlo problems, the oracle gate count is $10^4$ to $10^7$ gates, so even 1000-iteration amplitude estimation requires $10^7$ to $10^{10}$ total gates — well into fault-tolerant territory.

By contrast, the classical Monte Carlo approach has very cheap per-sample cost (a few floating-point operations) but needs $O(1/\epsilon^2)$ samples. For $\epsilon = 0.01$ , that is $10^4$ samples. A modern CPU runs each sample in microseconds — so the total wall-clock for classical Monte Carlo can be sub-second.

The honest crossover point is roughly: amplitude estimation beats classical Monte Carlo only when the oracle is small (say, under $10^6$ gates) and the target precision is high (say, $\epsilon \le 10^{-4}$ ). For most practical pricing and risk problems, neither condition holds — the oracle is big and the precision target is modest. This is why published quantum-financial-pricing demonstrations consistently fail to demonstrate quantum advantage, despite the clean asymptotic theory.

QSVT connection

Amplitude estimation can be reformulated as eigenvalue estimation of the Grover operator $Q$ via QSVT. The QSVT-based view: $Q$ is the qubitization unitary for a particular block-encoded matrix; phase estimation is QSVT with the polynomial $p(x) =$ the sinusoid at the eigenvalue. This reformulation gives:

A cleaner analysis of the noise robustness of various amplitude-estimation variants.
Direct connections to other QSVT-based algorithms (Hamiltonian simulation, eigenvalue thresholding).
A path to amplitude estimation algorithms with sub-optimal but more practical noise dependence.

In practical 2026 implementations, amplitude estimation is rarely presented in QSVT form — the BHMT/MLAE/IQAE framings are easier to implement. But the unification under QSVT is conceptually valuable for research.

Common misconceptions

“Quadratic speedup is automatic.” No — it requires a clean state-preparation oracle, sufficient circuit depth on real hardware, and a precision target where the asymptotic formula is sharp. Many practical applications fail one or more of these conditions.

“Amplitude estimation makes quantum Monte Carlo obviously useful.” No — see the crossover-point discussion above. The oracle gate count usually dominates and erodes the asymptotic advantage at practical precisions.

“BHMT is the algorithm to implement.” No — for NISQ-era and even early-FTQC implementations, MLAE or IQAE are usually preferred. BHMT requires controlled $Q$ and QFT readout, both expensive.

“The speedup over classical Monte Carlo is exponential.” No — it is quadratic. Confusing quadratic and exponential speedups is one of the most common mistakes in popular quantum-computing accounts.

“Amplitude estimation is only useful for finance.” False — it is used in quantum chemistry (expectation-value estimation), partition-function estimation, and any algorithm needing high-precision probability readout.

Decision rule

For a candidate Monte Carlo problem, decide whether amplitude estimation helps:

What is the oracle gate count? Cheap oracle (under $10^4$ gates): amplitude estimation can win at $\epsilon < 10^{-3}$ . Expensive oracle (above $10^6$ gates): probably not, classical Monte Carlo wins at typical precisions.
What is the target precision? $\epsilon \ge 10^{-2}$ : classical wins almost always. $\epsilon \le 10^{-4}$ : amplitude estimation can win. $10^{-2} < \epsilon < 10^{-4}$ : borderline, requires careful crossover analysis.
What is the hardware regime? Fault-tolerant: amplitude estimation is feasible at any precision. NISQ: limited to small oracles and modest precisions, controlled- $Q$ depth is the bottleneck.
Which variant? BHMT for fault-tolerant. MLAE or IQAE for NISQ. Bayesian for problems with adaptive precision targets.

If you survive these four questions and amplitude estimation still looks worthwhile, implement it. If any question fails, classical Monte Carlo is probably the better choice.

Exercises

1. Sample-count comparison

You want to estimate a probability $p$ to additive error $\epsilon = 10^{-3}$ . How many classical samples? How many amplitude-estimation oracle calls?

Show answer

Classical: $1/\epsilon^2 = 10^6$ samples. Amplitude estimation: $1/\epsilon = 10^3$ oracle calls. Speedup factor: 1000×, if the oracle cost is comparable to the classical sample cost. In most practical settings, the oracle is much more expensive than a classical sample, and the ratio of total wall-clock time is much smaller — often less than 1 (i.e., classical wins).

2. Crossover point

Suppose your oracle has $G_q$ gates and one classical sample takes $G_c$ floating-point operations. At what precision $\epsilon$ does amplitude estimation match classical Monte Carlo in total operations?

Show answer

Classical total: $G_c / \epsilon^2$ . Amplitude estimation total: $G_q / \epsilon$ . Crossover when $G_c / \epsilon^2 = G_q / \epsilon$ , i.e., $\epsilon^* = G_c / G_q$ . For a typical option-pricing problem with $G_q \sim 10^6$ gates and $G_c \sim 10$ flops, crossover is at $\epsilon^* = 10^{-5}$ . For most production pricing, target precisions are around $10^{-2}$ to $10^{-3}$ , well above crossover — classical wins. For research-grade precision-critical applications, amplitude estimation can win.

3. Why MLAE works without phase estimation

Sketch the intuition for why measuring at multiple $m$ values lets you reconstruct $\theta$ to precision $O(1/m_\text{max})$ , without doing a QFT. Why is this not free — what is the tradeoff?

Show answer

Each $Q^m$ measurement gives one observation of $\sin^2((2m+1)\theta)$ . Combining many such observations with different $m$ values lets you triangulate $\theta$ . The largest $m$ value sets the resolution (the higher-frequency oscillations are sharper). Without phase estimation, you avoid the QFT and the controlled- $Q$ constructions, but you pay in classical post-processing: the maximum-likelihood optimization is non-trivial and can have local minima for poorly chosen $m$ values. Practical MLAE implementations use carefully designed schedules of $m$ values (geometric, exponential) to ensure the likelihood landscape has a clear unique maximum.

4. Application choice

You have three candidate problems for amplitude estimation: (a) European option pricing on a 10-asset basket, (b) computing the partition function of a 50-spin Ising model, (c) estimating the expectation value of an observable in a known quantum-chemistry ground state. Rank them by how likely amplitude estimation is to deliver real speedup, and explain.

Show answer

Most likely to deliver: (c). Quantum-chemistry ground-state expectation values have small oracles (the Hamiltonian is the expensive part, but it’s reused), high precision targets (chemical accuracy ~ $10^{-3}$ to $10^{-4}$ ), and natural quantum encoding. Most published “quantum advantage in chemistry” claims are amplitude-estimation-based. Middle: (b). Partition functions need amplitude estimation embedded in a thermal-state preparation oracle; the oracle cost is moderate; precision requirements depend on application. Least likely: (a). Option pricing has a heavy state-preparation oracle (Brownian motion on 10 assets), modest precision targets, and benchmarks consistently show classical Monte Carlo wins on practical problem sizes. Despite being the most-publicized quantum-finance claim, this is the worst candidate of the three.

Where this goes next

This concludes the algorithms-track foundations arc: block encoding (29) → QSVT (30) → Hamiltonian simulation (31) → amplitude estimation (32). Subsequent algorithms-track tutorials will dig into linear-system solvers (HHL successors, Childs-Kothari-Somma), eigenvalue estimation, ground-state preparation algorithms, and quantum walks — each presented through the QSVT lens that 29-31 set up.

Horizontally, you might pivot to the hardware track (trapped ions, neutral atoms, photonic) or to the variational track (ADAPT-VQE, quantum natural gradient, barren plateaus) for a different angle on the same fault-tolerance and gate-cost considerations that drive the algorithms decisions.

The setup

The Grover operator

The canonical BHMT algorithm

Variant 1: Maximum-likelihood amplitude estimation (MLAE)

Variant 2: Iterative amplitude estimation (IQAE)

Variant 3: Bayesian amplitude estimation

A small Python implementation

Why the quadratic speedup is hard to deploy in practice

QSVT connection

Common misconceptions

Decision rule

Exercises

1. Sample-count comparison

2. Crossover point

3. Why MLAE works without phase estimation

4. Application choice

Where this goes next

Quantum, for people who already code.