variational advanced · 14 min read · By LIPAI WANG · May 1, 2026

Hamiltonian Variational Ansatz: How to Build Trainable Ansätze from the Problem Itself

The Hamiltonian variational ansatz (HVA) builds variational circuits directly from the structure of the target Hamiltonian. Unlike generic hardware-efficient ansätze, HVA inherits problem symmetries, often avoids barren plateaus, and naturally connects to adiabatic quantum computing. This tutorial covers HVA construction, the connection to the quantum approximate optimization algorithm (QAOA, tutorial 14), and the design principles that make problem-tailored ansätze the production choice for variational chemistry and optimization.

Prerequisites: Tutorial 14: QAOA for Combinatorial Optimization, Tutorial 37: Barren Plateaus, Tutorial 65: Imaginary-Time Evolution

Tutorial 37 covered barren plateaus: the structural reason hardware-efficient ansätze (HEA) fail to train at scale. The recommended escape route was problem-tailored ansätze — variational circuits built from the structure of the target problem rather than from generic gate templates.

The most successful family of problem-tailored ansätze is the Hamiltonian variational ansatz (HVA). Given a target Hamiltonian $H$ , decompose it as $H = \sum_i H_i$ (a sum of “easy-to-implement” terms). Build the ansatz as a sequence of layers, each applying $e^{-i \theta_{i,\ell} H_i}$ for parameter $\theta_{i,\ell}$ . The ansatz is now problem-specific by construction.

HVA inherits properties from the underlying physics:

Symmetries are preserved. If $H$ commutes with some symmetry operator (particle number, spin, parity), the HVA layers preserve it too.
Adiabatic connection. HVA can be derived as a discretization of an adiabatic evolution — providing both a starting point and a convergence argument.
Trainability. HVA depth often stays in the trainable regime even at large qubit counts, in regimes where HEA hits barren plateaus.
QAOA is a special case. The QAOA ansatz of tutorial 14 is exactly an HVA for combinatorial optimization Hamiltonians.

This tutorial covers HVA construction, the design principles, the QAOA connection, and the regimes where HVA wins over alternatives.

The HVA construction

Given a Hamiltonian $H = H_A + H_B$ split into two non-commuting parts, the basic 1-layer HVA is

|\psi(\boldsymbol{\theta})\rangle \;=\; e^{-i \theta_2 H_B} \, e^{-i \theta_1 H_A} |\psi_0\rangle,

where $|\psi_0\rangle$ is a chosen reference state (often the ground state of $H_A$ or $H_B$ , whichever is easy to prepare classically).

For depth $p$ , the HVA has $2p$ parameters:

|\psi(\boldsymbol{\theta})\rangle \;=\; \prod_{\ell=1}^{p} e^{-i \theta_{2\ell} H_B} \, e^{-i \theta_{2\ell-1} H_A} |\psi_0\rangle.

For more general $H = \sum_i H_i$ , the HVA layer applies each $H_i$ in sequence:

\text{HVA layer} \;=\; \prod_i e^{-i \theta_{i,\ell} H_i}.

The exact structure (which terms to apply, in what order) is a design choice that depends on the Hamiltonian and the problem-specific physics.

Why HVA preserves symmetries

If $H$ commutes with a symmetry operator $S$ (e.g., $S$ is the particle-number operator and $[S, H] = 0$ ), then each summand $H_i$ may or may not commute with $S$ . If every $H_i$ commutes with $S$ , then every $e^{-i \theta H_i}$ commutes with $S$ , so the HVA preserves the symmetry exactly.

For chemistry: the molecular Hamiltonian preserves particle number $N$ and spin $S^2, S_z$ . UCCSD ansatz terms (single and double excitations) all preserve these symmetries. HVA built from UCCSD terms therefore stays in the physical symmetry sector — never “wandering” into nonphysical states.

This is more than a curiosity: symmetry preservation reduces the effective parameter space, often by orders of magnitude. The optimization problem becomes easier because the optimizer doesn’t waste effort exploring nonphysical regions.

Why HVA helps with barren plateaus

The barren-plateau theorems (tutorial 37) assume the ansatz is sufficiently random — close to a unitary 2-design. HVA is not sufficiently random for typical depths. The structured terms $H_i$ in the ansatz commute with the underlying physics; they don’t randomize the state into Haar-typical territory.

Concrete result (Cerezo-Sone-Volkoff-Cincio-Coles 2021 and follow-ups): HVA at logarithmic depth has polynomially-vanishing gradients (no barren plateau). At polynomial depth, plateaus can appear, but typically less severely than HEA at the same depth.

For chemistry molecules of moderate size (10-50 qubits), HVA is consistently trainable at depth 5-50, while HEA at the same parameters runs into severe plateaus. The trainability difference is the practical reason HVA dominates production VQE.

The adiabatic connection

HVA has a deep connection to adiabatic quantum computing. Suppose you start with a Hamiltonian $H_0$ whose ground state $|\psi_0\rangle$ is easy to prepare, and you want the ground state of a target $H_T$ . Define an interpolation:

H(s) \;=\; (1 - s) H_0 + s H_T, \quad s \in [0, 1].

The adiabatic theorem says: if you slowly evolve under $H(s)$ (changing $s$ from 0 to 1 slowly enough), the state stays in the ground state of $H(s)$ throughout, ending at the ground state of $H_T$ .

Discretizing this evolution into Trotter steps:

|\psi_T\rangle \;\approx\; \prod_{\ell=1}^{p} e^{-i (1-s_\ell) H_0 \delta t} \, e^{-i s_\ell H_T \delta t} |\psi_0\rangle.

This is exactly an HVA with $H_A = H_0$ and $H_B = H_T$ , and parameters $\theta_{1,\ell} = (1 - s_\ell) \delta t$ , $\theta_{2,\ell} = s_\ell \delta t$ . The HVA is a parameterized version of the adiabatic discretization, where the parameters are optimized rather than chosen by the adiabatic schedule.

This connection gives HVA two practical advantages:

Adiabatic-schedule warm starts. Initialize HVA parameters from a (possibly coarse) adiabatic schedule. The optimizer then refines.
Convergence guarantees. For sufficient depth, the adiabatic theorem guarantees HVA can represent the ground state. The deeper you go, the more closely you can approximate.

QAOA as an HVA special case

Tutorial 14 introduced QAOA. The QAOA ansatz is

|\psi(\boldsymbol{\beta}, \boldsymbol{\gamma})\rangle \;=\; \prod_{\ell=1}^{p} e^{-i \beta_\ell H_M} \, e^{-i \gamma_\ell H_C} |+\rangle^{\otimes n},

where $H_C$ is the cost Hamiltonian (encoding the optimization problem) and $H_M = \sum_i X_i$ is the mixer Hamiltonian. This is exactly an HVA with $H_A = H_M$ and $H_B = H_C$ . The $|+\rangle^{\otimes n}$ initial state is the ground state of $-H_M$ .

The QAOA literature is essentially the HVA literature applied to combinatorial optimization. The depth- $p$ QAOA has $2p$ parameters; performance improves with $p$ for many problems but not universally (the “QAOA depth-quality” question).

A small HVA demonstration

Concrete code building a Hamiltonian variational ansatz for a Heisenberg-style chain:

import pennylane as qml
import numpy as np
from pennylane import numpy as pnp

n_qubits = 4

# Heisenberg Hamiltonian: H = sum_i (X_i X_{i+1} + Y_i Y_{i+1} + Z_i Z_{i+1})
def heisenberg_terms(n):
    terms = []
    coeffs = []
    for i in range(n - 1):
        terms.append(qml.PauliX(i) @ qml.PauliX(i + 1))
        terms.append(qml.PauliY(i) @ qml.PauliY(i + 1))
        terms.append(qml.PauliZ(i) @ qml.PauliZ(i + 1))
        coeffs.extend([1.0, 1.0, 1.0])
    return qml.Hamiltonian(coeffs, terms)


H = heisenberg_terms(n_qubits)

# Initial state for HVA: classical Néel state |0101...> (close to ground state of -ZZ).
def neel_state():
    for i in range(n_qubits):
        if i % 2 == 1:
            qml.PauliX(wires=i)


# HVA layer: alternate XX, YY, ZZ exponentials (the three Heisenberg terms).
def hva_layer(params, n):
    for i in range(n - 1):
        qml.IsingXX(params[3 * i + 0], wires=[i, i + 1])
        qml.IsingYY(params[3 * i + 1], wires=[i, i + 1])
        qml.IsingZZ(params[3 * i + 2], wires=[i, i + 1])


dev = qml.device("default.qubit", wires=n_qubits)


@qml.qnode(dev)
def hva_circuit(params, p):
    """HVA with p layers."""
    neel_state()
    n_per_layer = 3 * (n_qubits - 1)
    for ell in range(p):
        hva_layer(params[ell * n_per_layer:(ell + 1) * n_per_layer], n_qubits)
    return qml.expval(H)


# Train for 1 layer, 2 layers, 4 layers.
for p in [1, 2, 4]:
    n_params = p * 3 * (n_qubits - 1)
    params = pnp.array(np.random.uniform(-0.1, 0.1, n_params), requires_grad=True)
    opt = qml.AdamOptimizer(stepsize=0.05)

    for step in range(100):
        params, energy = opt.step_and_cost(lambda p: hva_circuit(p, p_layers=p), params)
        # Note: function signature subtlety; in real code use partial(hva_circuit, p=p).

    print(f"p={p}: final energy = {float(energy):.4f}, n_params = {n_params}")

print(f"Exact ground-state energy: {min(np.linalg.eigvalsh(qml.matrix(H))):.4f}")

For larger $p$ , HVA’s energy approaches the exact ground state. The increase in expressivity from larger $p$ is the trade-off against optimization cost.

Common misconceptions

“HVA is the same as Trotterized adiabatic evolution.” Trotterized adiabatic evolution is an unparameterized HVA (parameters fixed by the adiabatic schedule). HVA optimizes those parameters, often achieving better results at lower depth.

“HVA always avoids barren plateaus.” Up to a point. At very large depth, HVA can still hit plateaus. The “depth where plateaus appear” is much higher than for HEA, but not infinite.

“HVA only works for Hamiltonian-grounded problems.” It works whenever you can write your problem as minimizing a Hamiltonian’s expectation. This includes chemistry, optimization (via the QAOA mapping), some machine-learning problems, and many physical-simulation tasks.

“HVA is a single specific ansatz.” It’s a family. Different splits of $H = H_A + H_B + ...$ give different HVAs, with different convergence and trainability tradeoffs. Designing the right split is the variational-physicist’s craft.

Decision rule

Use HVA when:

Your problem is naturally a ground-state problem. Chemistry, condensed-matter simulation, optimization (via QAOA).
You can decompose the Hamiltonian into easily-implementable terms. Pauli strings on chemistry, problem-specific terms on optimization.
You want to mitigate barren plateaus. HVA is far less plateau-prone than HEA.
You can afford problem-specific compilation. HVA is not a generic library function — it requires knowing the Hamiltonian structure.

Use HEA (hardware-efficient ansatz) when:

You don’t have a Hamiltonian-structured problem (e.g., generic QML on classical data).
You need maximal flexibility.
The hardware native gate set strongly suggests a specific HEA topology.

For most quantum-chemistry and optimization research in 2026, HVA (or its specific forms like UCCSD, ADAPT-VQE, QAOA) is the production choice. HEA appears mostly in benchmark comparisons and pedagogical examples.

Where this goes next

Tutorial 67 covers warm-start strategies — the techniques for initializing HVA parameters in regions of parameter space where optimization converges quickly. Together with HVA’s architectural advantages, warm starts make variational algorithms practical at scales where naive random initialization would fail.