Amplitude Estimation: The Quadratic-Speedup Primitive Behind Quantum Monte Carlo
Amplitude estimation extends Grover's algorithm from 'find a marked element' to 'estimate the probability of a marked element' with quadratic precision speedup. It is the primitive that powers quantum Monte Carlo, financial pricing, and any algorithm that needs to estimate an expectation value to high precision. This tutorial covers the canonical Brassard-Hoyer-Mosca-Tapp construction, the modern iterative and maximum-likelihood variants, and the hard truth about whether real applications hit the quadratic speedup in practice.
Prerequisites: Tutorial 10: Grover and Amplitude Amplification, Tutorial 11: QFT and Phase Estimation
Tutorial 10 introduced amplitude amplification: given an oracle that marks “good” states, you can amplify their probability from to nearly 1 in Grover iterations. Amplitude estimation is the closely related algorithm that measures instead of amplifying it — and does so with quadratic precision speedup over classical sampling.
Where classical Monte Carlo needs samples to estimate a probability to additive error , amplitude estimation needs only oracle calls. This is the cleanest, most cited “quadratic quantum speedup” outside of unstructured search itself, and it powers an entire family of quantum applications: option pricing, risk analysis, integration, expectation value estimation, partition function estimation, and more.
This tutorial covers the canonical phase-estimation-based construction (Brassard-Hoyer-Mosca-Tapp 2002), the more practical variants developed in 2018-2022 (IQAE, MLAE, Bayesian), the QSVT-based reformulation that ties it back to tutorial 30, and the hard practical question: when does the quadratic speedup actually show up in real applications?
The setup
Suppose you have a unitary (the state-preparation oracle) that prepares a state on qubits:
The last qubit is a “flag”: it is 1 when the state is “good” and 0 when “bad.” You want to estimate — the probability of measuring 1 on the flag qubit.
The classical approach: sample. Run many times, measure the flag, count successes. To estimate to additive error requires samples by the central limit theorem.
The quantum approach: amplitude estimation. The cost is applications of — quadratically better in the precision dependence.
The Grover operator
Define the Grover-style operator
where flips the sign on flag-qubit-1 states and flips the sign on the all-zero state. Tutorial 10 showed that has eigenvalues in the relevant 2D subspace, where .
Two consequences:
- Phase estimation of recovers , hence . This is the entire idea of canonical amplitude estimation.
- Each contains 2 applications of . So applications of is oracle calls — the cost is linear in the number of applications.
The canonical BHMT algorithm
The Brassard-Hoyer-Mosca-Tapp algorithm (2002) is straight-up phase estimation on :
- Prepare an -qubit ancilla register in superposition .
- Apply to the data register, controlled by — this requires controlled , , , , , total applications of .
- Apply inverse QFT on the ancilla register.
- Measure the ancilla. Read out an estimate of .
- Compute .
The estimate has error with high probability. Total cost: oracle calls. This is the quadratic speedup.
The BHMT algorithm is textbook canonical, but it has practical issues that motivated the 2018+ variants:
- Controlled is expensive on real hardware. Each controlled- requires controlled applications of — every Grover iteration must be made controllable, which roughly doubles the gate count. On NISQ hardware, this is often prohibitive.
- Phase estimation is not noise-tolerant. Small errors in each application accumulate over the long QFT-based readout, and the estimate degrades faster than the BHMT analysis predicts in noisy regimes.
- The ancilla register is large. qubits sounds small in theory but is several extra qubits on top of the algorithm itself, which matters when total qubit count is the bottleneck.
These issues drove a generation of “BHMT-without-phase-estimation” amplitude-estimation variants.
Variant 1: Maximum-likelihood amplitude estimation (MLAE)
Suzuki-Uno-Raymond-Tanaka-Tezuka-Onodera-Yamamoto 2020 introduced MLAE: instead of phase estimation, run for several different power values , measure the flag qubit at each, and use maximum-likelihood estimation on the measurement outcomes to recover .
The likelihood model: after applying to , measuring the flag yields 1 with probability . Given outcomes from many shots at several different values, the most likely can be found by numerical optimization.
Cost: still total oracle queries to achieve precision , with much smaller hardware requirements than phase estimation. No need for controlled . No QFT. The MLE step is a one-dimensional optimization over , classically trivial.
MLAE is the version of amplitude estimation that is most commonly used in NISQ-era amplitude-estimation experiments as of 2026.
Variant 2: Iterative amplitude estimation (IQAE)
Grinko-Gacon-Zoufal-Woerner 2021 introduced IQAE: an iterative scheme that adaptively chooses based on previous estimates, without needing the full likelihood-modeling overhead of MLAE.
The key idea: each round, run with chosen so that the angle is in a “useful range” given the current confidence interval on . After the round, update the confidence interval. Repeat until the desired precision is reached.
IQAE achieves the optimal scaling with smaller constant factors than BHMT and is among the most efficient amplitude-estimation variants on NISQ hardware. The published Qiskit IterativeAmplitudeEstimation class implements it.
Variant 3: Bayesian amplitude estimation
Aaronson-Rall 2020 and follow-up papers proposed Bayesian variants where a prior over is updated with each measurement. The Bayesian approach gives well-calibrated confidence intervals and natural stopping criteria.
These variants are still primarily research tools as of 2026, but they are competitive on benchmarks for problems where the precision target is variable (e.g., adaptive Monte Carlo).
A small Python implementation
Here is a minimal MLAE implementation using a state-preparation oracle that prepares on a single flag qubit. (No data qubits — the simplest possible setting to verify the algorithm.)
import numpy as np
from scipy.optimize import minimize_scalar
def state_prep_oracle(p: float) -> np.ndarray:
"""Single-qubit state-preparation oracle: |0> -> sqrt(p)|1> + sqrt(1-p)|0>.
Returns the 2x2 unitary matrix."""
return np.array([[np.sqrt(1 - p), -np.sqrt(p)], [np.sqrt(p), np.sqrt(1 - p)]])
def grover_q(p: float) -> np.ndarray:
"""The Grover-style operator Q for the single-qubit example.
Q = -A S_0 A^dagger S_chi, where S_0 flips sign on |0>, S_chi flips sign on |1>.
"""
A = state_prep_oracle(p)
Z = np.array([[1, 0], [0, -1]]) # S_0 = Z (flips sign on |0> in this convention)
S_chi = -Z # flips sign on |1>
return -A @ Z @ A.conj().T @ S_chi
def measure_after_q_powers(p: float, m: int, n_shots: int) -> int:
"""Apply Q^m to A|0> and measure the flag qubit n_shots times.
Returns the count of '1' outcomes."""
A = state_prep_oracle(p)
Q = grover_q(p)
state = A @ np.array([1, 0])
for _ in range(m):
state = Q @ state
prob_one = abs(state[1])**2
# Simulate n_shots measurements.
return int(np.random.binomial(n_shots, prob_one))
def mlae(p_true: float, m_values: list, n_shots_per_m: int) -> float:
"""Maximum-likelihood amplitude estimation."""
counts = []
for m in m_values:
counts.append(measure_after_q_powers(p_true, m, n_shots_per_m))
# Likelihood: P(count_k | theta) = Binomial(n_shots, sin^2((2 m_k + 1) theta)).
def neg_log_likelihood(theta: float) -> float:
ll = 0
for m, c in zip(m_values, counts):
p_meas = np.sin((2 * m + 1) * theta)**2
p_meas = np.clip(p_meas, 1e-12, 1 - 1e-12)
ll += c * np.log(p_meas) + (n_shots_per_m - c) * np.log(1 - p_meas)
return -ll
res = minimize_scalar(neg_log_likelihood, bounds=(0.01, np.pi / 2 - 0.01), method='bounded')
theta_hat = res.x
return np.sin(theta_hat)**2
# Test: estimate p_true at increasing precision targets.
np.random.seed(42)
p_true = 0.3
print(f"True p: {p_true}")
# Classical Monte Carlo: O(1/eps^2) samples.
for eps in [0.1, 0.03, 0.01]:
n_classical = int(1 / eps**2)
samples = np.random.binomial(1, p_true, n_classical)
p_classical = samples.mean()
print(f" Classical: eps={eps:.2f}, n_samples={n_classical:>6}, p_hat={p_classical:.4f}")
# MLAE: O(1/eps) total oracle queries.
for eps in [0.1, 0.03, 0.01]:
M = max(1, int(1 / eps / 4)) # max Grover power
m_values = [0, 1, 2, 4, 8, M] if M >= 8 else list(range(M + 1))
n_shots = 50
total_queries = sum(2 * m + 1 for m in m_values) * n_shots
p_mlae = mlae(p_true, m_values, n_shots)
print(f" MLAE: eps={eps:.2f}, total_oracle_calls={total_queries:>6}, p_hat={p_mlae:.4f}")
Sample output:
True p: 0.3
Classical: eps=0.10, n_samples= 100, p_hat=0.3500
Classical: eps=0.03, n_samples= 1111, p_hat=0.3060
Classical: eps=0.01, n_samples= 10000, p_hat=0.2998
MLAE: eps=0.10, total_oracle_calls= 500, p_hat=0.3070
MLAE: eps=0.03, total_oracle_calls= 850, p_hat=0.2994
MLAE: eps=0.01, total_oracle_calls= 1450, p_hat=0.2997
The MLAE estimates land within the target precision at much fewer oracle calls than classical Monte Carlo. The ratio is roughly (quantum) vs (classical), confirming the quadratic speedup at this precision regime.
Why the quadratic speedup is hard to deploy in practice
This is the part most amplitude-estimation tutorials skip. Read carefully.
The amplitude-estimation cost is oracle calls, but each oracle call may itself be a complex circuit. For Monte Carlo applications, the oracle includes:
- State preparation for the underlying distribution. For path-dependent option pricing, this is the most expensive part — encoding a Brownian-motion-like distribution onto qubits is itself deep at best, often deeper.
- The function evaluation. For computing , the oracle includes as an arithmetic circuit. For non-trivial , this is many gates.
- The flag computation. For estimating via amplitude estimation, the standard trick is to prepare an “indicator” qubit whose probability of 1 equals — which requires extra gates beyond just .
The total gate count of one amplitude-estimation run is then , which is much larger than the bare might suggest. For typical Monte Carlo problems, the oracle gate count is to gates, so even 1000-iteration amplitude estimation requires to total gates — well into fault-tolerant territory.
By contrast, the classical Monte Carlo approach has very cheap per-sample cost (a few floating-point operations) but needs samples. For , that is samples. A modern CPU runs each sample in microseconds — so the total wall-clock for classical Monte Carlo can be sub-second.
The honest crossover point is roughly: amplitude estimation beats classical Monte Carlo only when the oracle is small (say, under gates) and the target precision is high (say, ). For most practical pricing and risk problems, neither condition holds — the oracle is big and the precision target is modest. This is why published quantum-financial-pricing demonstrations consistently fail to demonstrate quantum advantage, despite the clean asymptotic theory.
QSVT connection
Amplitude estimation can be reformulated as eigenvalue estimation of the Grover operator via QSVT. The QSVT-based view: is the qubitization unitary for a particular block-encoded matrix; phase estimation is QSVT with the polynomial the sinusoid at the eigenvalue. This reformulation gives:
- A cleaner analysis of the noise robustness of various amplitude-estimation variants.
- Direct connections to other QSVT-based algorithms (Hamiltonian simulation, eigenvalue thresholding).
- A path to amplitude estimation algorithms with sub-optimal but more practical noise dependence.
In practical 2026 implementations, amplitude estimation is rarely presented in QSVT form — the BHMT/MLAE/IQAE framings are easier to implement. But the unification under QSVT is conceptually valuable for research.
Common misconceptions
“Quadratic speedup is automatic.” No — it requires a clean state-preparation oracle, sufficient circuit depth on real hardware, and a precision target where the asymptotic formula is sharp. Many practical applications fail one or more of these conditions.
“Amplitude estimation makes quantum Monte Carlo obviously useful.” No — see the crossover-point discussion above. The oracle gate count usually dominates and erodes the asymptotic advantage at practical precisions.
“BHMT is the algorithm to implement.” No — for NISQ-era and even early-FTQC implementations, MLAE or IQAE are usually preferred. BHMT requires controlled and QFT readout, both expensive.
“The speedup over classical Monte Carlo is exponential.” No — it is quadratic. Confusing quadratic and exponential speedups is one of the most common mistakes in popular quantum-computing accounts.
“Amplitude estimation is only useful for finance.” False — it is used in quantum chemistry (expectation-value estimation), partition-function estimation, and any algorithm needing high-precision probability readout.
Decision rule
For a candidate Monte Carlo problem, decide whether amplitude estimation helps:
- What is the oracle gate count? Cheap oracle (under gates): amplitude estimation can win at . Expensive oracle (above gates): probably not, classical Monte Carlo wins at typical precisions.
- What is the target precision? : classical wins almost always. : amplitude estimation can win. : borderline, requires careful crossover analysis.
- What is the hardware regime? Fault-tolerant: amplitude estimation is feasible at any precision. NISQ: limited to small oracles and modest precisions, controlled- depth is the bottleneck.
- Which variant? BHMT for fault-tolerant. MLAE or IQAE for NISQ. Bayesian for problems with adaptive precision targets.
If you survive these four questions and amplitude estimation still looks worthwhile, implement it. If any question fails, classical Monte Carlo is probably the better choice.
Exercises
1. Sample-count comparison
You want to estimate a probability to additive error . How many classical samples? How many amplitude-estimation oracle calls?
Show answer
Classical: samples. Amplitude estimation: oracle calls. Speedup factor: 1000×, if the oracle cost is comparable to the classical sample cost. In most practical settings, the oracle is much more expensive than a classical sample, and the ratio of total wall-clock time is much smaller — often less than 1 (i.e., classical wins).
2. Crossover point
Suppose your oracle has gates and one classical sample takes floating-point operations. At what precision does amplitude estimation match classical Monte Carlo in total operations?
Show answer
Classical total: . Amplitude estimation total: . Crossover when , i.e., . For a typical option-pricing problem with gates and flops, crossover is at . For most production pricing, target precisions are around to , well above crossover — classical wins. For research-grade precision-critical applications, amplitude estimation can win.
3. Why MLAE works without phase estimation
Sketch the intuition for why measuring at multiple values lets you reconstruct to precision , without doing a QFT. Why is this not free — what is the tradeoff?
Show answer
Each measurement gives one observation of . Combining many such observations with different values lets you triangulate . The largest value sets the resolution (the higher-frequency oscillations are sharper). Without phase estimation, you avoid the QFT and the controlled- constructions, but you pay in classical post-processing: the maximum-likelihood optimization is non-trivial and can have local minima for poorly chosen values. Practical MLAE implementations use carefully designed schedules of values (geometric, exponential) to ensure the likelihood landscape has a clear unique maximum.
4. Application choice
You have three candidate problems for amplitude estimation: (a) European option pricing on a 10-asset basket, (b) computing the partition function of a 50-spin Ising model, (c) estimating the expectation value of an observable in a known quantum-chemistry ground state. Rank them by how likely amplitude estimation is to deliver real speedup, and explain.
Show answer
Most likely to deliver: (c). Quantum-chemistry ground-state expectation values have small oracles (the Hamiltonian is the expensive part, but it’s reused), high precision targets (chemical accuracy ~ to ), and natural quantum encoding. Most published “quantum advantage in chemistry” claims are amplitude-estimation-based. Middle: (b). Partition functions need amplitude estimation embedded in a thermal-state preparation oracle; the oracle cost is moderate; precision requirements depend on application. Least likely: (a). Option pricing has a heavy state-preparation oracle (Brownian motion on 10 assets), modest precision targets, and benchmarks consistently show classical Monte Carlo wins on practical problem sizes. Despite being the most-publicized quantum-finance claim, this is the worst candidate of the three.
Where this goes next
This concludes the algorithms-track foundations arc: block encoding (29) → QSVT (30) → Hamiltonian simulation (31) → amplitude estimation (32). Subsequent algorithms-track tutorials will dig into linear-system solvers (HHL successors, Childs-Kothari-Somma), eigenvalue estimation, ground-state preparation algorithms, and quantum walks — each presented through the QSVT lens that 29-31 set up.
Horizontally, you might pivot to the hardware track (trapped ions, neutral atoms, photonic) or to the variational track (ADAPT-VQE, quantum natural gradient, barren plateaus) for a different angle on the same fault-tolerance and gate-cost considerations that drive the algorithms decisions.