variational advanced · 18 min read · April 29, 2026

ADAPT-VQE: Building the Ansatz One Operator at a Time

ADAPT-VQE is the most-cited barren-plateau mitigation strategy in quantum chemistry. Instead of a fixed ansatz, ADAPT grows the ansatz adaptively, adding one operator at a time from a problem-defined pool, picking the operator with the largest gradient. The resulting ansatz is shorter than UCCSD, more accurate at modest qubit counts, and structurally easier to train. This tutorial covers the algorithm, the operator-pool design, the qubit-ADAPT variant, and the regimes where ADAPT wins versus where it doesn't.

Prerequisites: Tutorial 13: Variational Quantum Eigensolver, Tutorial 37: Barren Plateaus

Tutorial 37 explained why fixed-ansatz variational quantum eigensolvers struggle past modest qubit counts: barren plateaus. The standard chemistry ansatz, UCCSD, partly avoids this by being problem-tailored, but it has its own problem — UCCSD is too long. The full UCCSD ansatz on a chemistry molecule has $O(n^4)$ parameters and $O(n^4)$ gates, where $n$ is the qubit count. For modest molecules this is already hundreds of parameters; for chemically interesting larger molecules it becomes the dominant cost.

In 2019, Grimsley, Economou, Barnes, and Mayhall published ADAPT-VQE in Nature Communications: a variant of VQE that builds the ansatz adaptively, one operator at a time, from a problem-defined operator pool. The ansatz grows only as long as it needs to to reach chemical accuracy. The resulting ansatz is typically 10-100× shorter than full UCCSD, which:

Keeps gradient training inside the trainable regime (mitigating barren plateaus).
Reduces gate count, lowering noise and shot requirements.
Improves accuracy at modest qubit counts.

ADAPT-VQE has become the dominant trainable VQE strategy in chemistry. This tutorial covers the algorithm, operator-pool choices, the qubit-ADAPT variant, and an honest assessment of when ADAPT wins versus when it doesn’t.

The ADAPT-VQE algorithm

The full pseudocode, in plain English:

Define an operator pool. A finite set of Hermitian operators $\{A_k\}$ from which the ansatz will be built. For chemistry, the pool is typically the set of fermionic single and double excitations (the same operators that appear in UCCSD).
Start with the reference state. Usually Hartree-Fock $|\psi_0\rangle$ — the classically computed mean-field starting point.
At each ADAPT iteration:
- For each operator $A_k$ in the pool, compute the gradient $g_k = \langle \psi | [H, A_k] | \psi \rangle$ at the current ansatz state $|\psi\rangle$ .
- Pick the operator $A_{k^*}$ with the largest gradient magnitude.
- Append $e^{-i \theta_{k^*} A_{k^*}}$ to the ansatz with a new variational parameter $\theta_{k^*}$ .
- Run a full VQE optimization on all parameters in the current ansatz.
Termination. Stop when the gradient norm $\|g\|$ falls below a threshold, or the energy improvement per iteration is below tolerance, or the operator-pool budget is exhausted.

The output is a problem-tailored ansatz with as few operators as possible to reach the target accuracy.

The mathematical insight: the gradient of the energy with respect to a new parameter is exactly the commutator expectation. If a particular operator’s gradient is large at the current state, adding it improves the energy; if small, it doesn’t help. ADAPT lets the algorithm select the most useful operators per iteration, rather than committing to all of them upfront.

The operator pool: fermionic vs qubit

The original ADAPT-VQE used fermionic excitation operators, mapped to qubits via Jordan-Wigner or Bravyi-Kitaev encoding. Fermionic operators have nice physical interpretation — single excitations are particle-hole moves, double excitations are pair correlations.

The downside: fermionic operators are non-local on the qubit chain after the encoding (a single fermionic excitation maps to a $\sim O(n)$ -qubit Pauli string). Implementing each operator costs many CNOTs.

In 2021, Tang, Mayhall, and Economou introduced qubit-ADAPT-VQE: use Pauli operators (e.g., individual $XYZ$ strings) as the pool elements instead of fermionic excitations. Pauli strings are natively local; each operator costs few CNOTs to implement. The pool is larger but each iteration is much cheaper, and the resulting ansatz typically uses ~3-5× more operators but ~3-5× fewer gates than fermionic-ADAPT.

The 2026 picture: qubit-ADAPT is the production choice for current hardware because gate counts dominate noise. Fermionic-ADAPT is preferred in resource estimates for fault-tolerant chemistry, where gate count is less constraining and structured operators are easier to compile.

What ADAPT actually delivers

A sharp comparison: ADAPT-VQE vs UCCSD on H6 (a six-hydrogen chain). UCCSD with full singles + doubles for 12 qubits has roughly 75 parameters; ADAPT-VQE typically reaches chemical accuracy with 15-30 operators. Energy convergence:

UCCSD: $E_\text{UCCSD} - E_\text{exact} \sim 10^{-3}$ Hartree
ADAPT-VQE: $E_\text{ADAPT} - E_\text{exact} \sim 10^{-4}$ Hartree

Two effects combine: ADAPT picks the most-useful operators (better physics), and ADAPT uses fewer parameters total (less overfitting on noisy gradient measurements).

Larger molecules show similar patterns. For BeH₂, LiH, H₄, and other small chemistry benchmarks, ADAPT typically halves the gate count for a given accuracy target and improves the achievable accuracy.

The hardware-efficient ansatz (HEA), by contrast, is structurally generic and pays the full barren-plateau cost. HEA on H6 typically reaches $\sim 10^{-2}$ Hartree accuracy at the same gate count, an order of magnitude worse than ADAPT.

A small ADAPT-VQE in PennyLane

A working sketch of ADAPT-VQE on H₂ — the simplest example, but enough to see the loop structure:

import numpy as np
import pennylane as qml
from pennylane import numpy as pnp

# H2 in STO-3G basis: 4 qubits, mapped from 4 spin-orbitals.
symbols = ["H", "H"]
coordinates = np.array([[0.0, 0.0, -0.6614], [0.0, 0.0, 0.6614]])

H, n_qubits = qml.qchem.molecular_hamiltonian(symbols, coordinates)

# Operator pool: all single and double excitations, mapped to Pauli strings.
electrons = 2
hf_state = qml.qchem.hf_state(electrons, n_qubits)
singles, doubles = qml.qchem.excitations(electrons, n_qubits)
single_excitations = [qml.SingleExcitation for _ in singles]
double_excitations = [qml.DoubleExcitation for _ in doubles]

# Combined operator pool with the corresponding wires.
pool = [(s, "single", singles[i]) for i, s in enumerate(single_excitations)] + \
       [(d, "double", doubles[i]) for i, d in enumerate(double_excitations)]

dev = qml.device("default.qubit", wires=n_qubits)


def make_circuit(operators_chosen, params):
    """Build the current ADAPT ansatz from chosen operators + their parameters."""
    @qml.qnode(dev)
    def circuit():
        qml.BasisState(hf_state, wires=range(n_qubits))
        for (op_class, op_type, op_wires), theta in zip(operators_chosen, params):
            op_class(theta, wires=op_wires)
        return qml.expval(H)
    return circuit


def gradient_for_op(operators_so_far, params_so_far, candidate_op):
    """Compute energy gradient when adding candidate_op with theta=0."""
    op_class, op_type, op_wires = candidate_op
    @qml.qnode(dev)
    def grad_circuit(theta):
        qml.BasisState(hf_state, wires=range(n_qubits))
        for (oc, _, ow), p in zip(operators_so_far, params_so_far):
            oc(p, wires=ow)
        op_class(theta, wires=op_wires)
        return qml.expval(H)

    g = qml.grad(grad_circuit)(pnp.array(0.0, requires_grad=True))
    return abs(g)


# ADAPT loop
operators_chosen = []
params = []
threshold = 1e-3
max_iters = 8

for adapt_iter in range(max_iters):
    # Score each pool operator by its current gradient magnitude.
    grads = [gradient_for_op(operators_chosen, params, p) for p in pool]
    best_idx = int(np.argmax(grads))
    best_grad = grads[best_idx]

    print(f"Iter {adapt_iter}: max gradient = {best_grad:.4e}")
    if best_grad < threshold:
        print("Convergence reached.")
        break

    # Add the best operator to the ansatz.
    operators_chosen.append(pool[best_idx])
    params.append(0.0)

    # Optimize all parameters in the current ansatz.
    params_arr = pnp.array(params, requires_grad=True)
    opt = qml.AdamOptimizer(stepsize=0.1)

    def cost(p):
        circ = make_circuit(operators_chosen, p)
        return circ()

    for _ in range(100):
        params_arr = opt.step(cost, params_arr)
    params = list(params_arr)

    energy = cost(params_arr)
    print(f"  After optimization: E = {energy:.6f} Ha (chosen ops: {len(operators_chosen)})")

Sample output:

Iter 0: max gradient = 1.567e-01
  After optimization: E = -1.135731 Ha (chosen ops: 1)
Iter 1: max gradient = 4.231e-03
  After optimization: E = -1.137272 Ha (chosen ops: 2)
Iter 2: max gradient = 6.124e-04
Convergence reached.

Two operators reach the H₂ ground state to chemical accuracy. UCCSD on this problem would use 3 (single + double + single+double) operators and similar accuracy; the ADAPT loop measures which two are actually needed. For larger molecules the pruning advantage compounds — ADAPT typically uses ~25% of UCCSD’s operator count for the same accuracy.

Strengths of ADAPT-VQE

Fewer gates per accuracy target. ADAPT consistently produces shorter circuits than UCCSD or HEA at the same accuracy.
Better trainability. Each ADAPT iteration trains a small number of parameters; the cumulative ansatz grows slowly enough to stay in trainable territory.
Problem-tailored. The selected operators reflect the specific molecule’s electronic structure, not a generic ansatz template.
Natural termination criterion. The gradient threshold gives a principled stopping point. UCCSD doesn’t have this — you commit to the full ansatz upfront.
Compatible with classical pre-training. Hartree-Fock + ADAPT can be combined with classical-method warm starts (e.g., starting from a CCSD wavefunction approximation).

Weaknesses of ADAPT-VQE

Operator-selection cost. Each iteration requires computing gradients for every pool operator. For chemistry molecules with $\sim 10^3$ pool operators, this is $\sim 10^3$ extra circuit evaluations per iteration. Fast, but not negligible.
Greedy selection isn’t optimal. Picking the largest-gradient operator at each step is a heuristic. Some problems require operators that initially have small gradients but combine usefully with later additions.
Sensitive to the operator pool. A pool that doesn’t span the relevant subspace will fail to reach high accuracy regardless of how many operators are added. Pool design is a research area.
Doesn’t avoid all barren plateaus. ADAPT mitigates the initialization-time barren plateau by growing slowly, but the cumulative ansatz can still hit barren plateaus past a critical depth. ADAPT pushes the threshold; it does not eliminate it.
Harder to compile efficiently. Each ADAPT iteration changes the ansatz structure, requiring re-transpilation. UCCSD’s fixed structure is friendlier to compiler optimizations.

Common misconceptions

“ADAPT-VQE solves barren plateaus.” No. It mitigates them by keeping the cumulative ansatz short. Past a sufficiently large molecule, ADAPT-VQE itself enters barren-plateau territory. The mitigation is real but not a structural fix.

“ADAPT is just heuristic UCCSD.” It is heuristic, but the heuristic is principled (largest-gradient first) and produces measurably shorter ansätze than UCCSD across many benchmarks. The heuristic is good enough that ADAPT is the de facto chemistry choice for current hardware.

“Qubit-ADAPT is a strict improvement over fermionic-ADAPT.” Wrong direction. Qubit-ADAPT uses more operators but fewer gates per operator. The choice depends on whether gate count or operator count is more constraining — for noisy current hardware, qubit-ADAPT; for resource-estimated future hardware, fermionic-ADAPT often wins.

“ADAPT works the same way on chemistry and other Hamiltonians.” Mostly. The framework is generic; you can ADAPT for any Hamiltonian + operator pool. But the operator pool choice is the hard part for non-chemistry problems. For combinatorial optimization (where QAOA shines), ADAPT analogs exist but with less consensus on the right pool.

Decision rule

Use ADAPT-VQE when:

You’re doing chemistry at modest scale. ADAPT’s operator-selection structure aligns naturally with the locality and symmetry of molecular Hamiltonians.
Gate count is the binding constraint. ADAPT typically beats UCCSD and HEA on gate count for a given accuracy.
You want a problem-tailored ansatz without committing to all UCCSD operators.
You can afford the per-iteration gradient screening. This is the main computational overhead; if you cannot afford $\sim |\text{pool}|$ extra circuit evaluations per ADAPT step, fall back to fixed UCCSD.

Use UCCSD when:

You need a fixed-structure ansatz for compiler optimization or resource estimation.
The molecule is small enough that UCCSD is already not much longer than ADAPT.
You want classical-precomputable structure. UCCSD’s parameters can be initialized from classical CCSD; ADAPT’s adaptively-chosen operators have less classical pre-knowledge.

Use HEA when:

You don’t have a problem-natural pool and just need an ansatz that works.
The ansatz expressivity is more important than trainability — but be aware that HEA’s barren-plateau exposure is significant past 20 qubits.

For most current quantum-chemistry research in 2026, the answer is ADAPT first, with UCCSD as a fallback for benchmarking and HEA only as a last resort.

Exercises

1. Cost of operator screening

A chemistry pool has 200 operators. Each operator gradient costs 2 circuit evaluations. After 20 ADAPT iterations, how many circuit evaluations have been spent on screening alone, vs how many on full optimization with $\sim 100$ optimizer steps per iteration?

Show answer

Screening per iteration: $200 \times 2 = 400$ evaluations. Across 20 iterations: 8,000 evaluations on screening. Optimization per iteration: $\sim 100$ steps × $\sim$ ( $k$ parameters at iteration $k$ ) gradient evaluations, so cumulatively $\sim 100 \cdot \sum_{k=1}^{20} k = 100 \cdot 210 = 21{,}000$ evaluations on optimization. Optimization is ~3× more expensive than screening. This is why ADAPT’s overhead per iteration is dominated by the optimization, not the screening; the screening cost is real but not the limiting factor.

2. Why qubit-ADAPT can beat fermionic-ADAPT

Fermionic single excitations on a 12-qubit chain mapped via Jordan-Wigner have ~12 Pauli terms each, requiring ~24 CNOT gates per operator. A typical Pauli string in a qubit-ADAPT pool has 2-3 Pauli terms, requiring ~4-6 CNOTs. For an ansatz of 30 fermionic-ADAPT operators or 100 qubit-ADAPT operators reaching the same accuracy, compute total CNOT counts and discuss tradeoffs.

Show answer

Fermionic: $30 \times 24 = 720$ CNOTs. Qubit: $100 \times 5 = 500$ CNOTs. Qubit-ADAPT wins on total gate count by ~30% even though it uses more operators. With current hardware noise of $\sim 3 \times 10^{-3}$ per CNOT, fermionic ansatz expected gate-error: $\sim 90\%$ failure; qubit-ADAPT: $\sim 78\%$ failure. The difference is modest but real, and it matters for whether the algorithm is feasible at all on current hardware. For NISQ-era execution, optimize for gate count, not operator count.

3. ADAPT termination criteria

A run reports gradient norm $10^{-3}$ Hartree after 25 iterations and energy improvement of $10^{-4}$ Hartree per iteration. Should you continue, and if so, what would you check?

Show answer

You’re at the chemical-accuracy threshold ( $10^{-3}$ Hartree gradient norm corresponds roughly to $\sim 10^{-4}$ Hartree energy uncertainty). Check: (a) is your accuracy target chemical accuracy or tighter? (b) Are subsequent iterations still adding distinct operators or recycling already-chosen ones? (c) Has the energy plateaued or are you still improving by $10^{-4}$ Ha per step? If energy is still improving and you’re below your tolerance, continue. If energy has plateaued, stop — additional operators won’t help and may overfit on noisy gradient measurements. Practical heuristic: stop when both gradient norm and energy delta are below threshold for 2 consecutive iterations.

4. When ADAPT is a bad choice

Describe a quantum-chemistry problem where ADAPT-VQE would not be a good choice.

Show answer

Three scenarios where ADAPT struggles: (a) Strongly correlated systems where the operator pool is poorly designed (e.g., transition-metal complexes with multireference character that single + double excitations from Hartree-Fock can’t capture). (b) Very small systems where UCCSD is already short and the ADAPT screening overhead exceeds the operator-savings benefit. (c) Systems where the optimization landscape has many shallow local minima that ADAPT’s greedy selection traps you in early — the operator picked at iteration 5 might not combine optimally with operator picked at iteration 7, and ADAPT cannot revise its earlier picks. Multireference systems (FeMoco, Cr₂, transition-metal catalysis) often need a richer pool or alternative ansätze (e.g., contextual subspace methods). ADAPT is the production choice for single-reference chemistry; it is not a universal solution.

Where this goes next

Tutorial 39 covers the parameter-shift rule — the standard exact-gradient computation method that makes ADAPT and other variational algorithms feasible on real quantum hardware. Tutorial 40 covers quantum natural gradient, which can further improve training stability on barren-plateau-adjacent landscapes.

The ADAPT-VQE algorithm

The operator pool: fermionic vs qubit

What ADAPT actually delivers

A small ADAPT-VQE in PennyLane

Strengths of ADAPT-VQE

Weaknesses of ADAPT-VQE

Common misconceptions

Decision rule

Exercises

1. Cost of operator screening

2. Why qubit-ADAPT can beat fermionic-ADAPT

3. ADAPT termination criteria

4. When ADAPT is a bad choice

Where this goes next

Quantum, for people who already code.