algorithms advanced · 21 min read · By LIPAI WANG · April 29, 2026

Hamiltonian Simulation: From Trotter to Qubitization, the Modern Picture

Hamiltonian simulation — computing the quantum dynamics e^(-iHt) for a chemistry, materials, or condensed-matter Hamiltonian — is the original quantum-computing application Feynman pitched in 1981 and arguably the application most likely to deliver real value first. This tutorial covers the four standard algorithm families (product formulas, LCU/Taylor series, qubitization, QSVT-based simulation), their cost scalings, and when to reach for each.

Prerequisites: Tutorial 29: Block Encoding, Tutorial 30: Quantum Singular Value Transformation

When Feynman proposed quantum computing in 1981, he had one specific application in mind: simulating quantum mechanics. Classical computers struggle to simulate $n$ -qubit quantum systems because the state vector lives in a $2^n$ -dimensional space. Feynman’s pitch was that a quantum machine, being itself $2^n$ -dimensional in the same way, would have no such bottleneck.

Forty-five years later, Hamiltonian simulation — computing $e^{-iHt}|\psi\rangle$ for a target Hamiltonian $H$ and a target time $t$ — is still the application most likely to produce useful quantum advantage first. It is the killer app for quantum chemistry, condensed-matter physics, materials science, and high-energy physics simulation. It also has the cleanest theoretical case for why classical algorithms are inadequate: many physical Hamiltonians have explicit structure that quantum simulation exploits and classical simulation cannot.

This tutorial covers the four algorithm families used in 2026: product formulas (Trotter), LCU and Taylor-series methods, qubitization-based simulation, and QSVT-based simulation. Each has different cost scaling in time, error, sparsity, and Hamiltonian norm. Pick the right one for your problem; getting this wrong is one of the most expensive mistakes in algorithm design.

The problem and the cost target

Given a Hamiltonian $H$ acting on $n$ qubits, a time $t > 0$ , and an error tolerance $\epsilon$ , we want a quantum circuit that approximates the unitary $U = e^{-iHt}$ to within $\epsilon$ in operator norm. The cost target is the gate complexity of the circuit, often measured in T-gate count for fault-tolerant analyses or in two-qubit gate count for NISQ-era work.

The complexity scales with several parameters:

$t$ — the simulation time. Longer time implies more work.
$\epsilon$ — the error tolerance. Smaller error implies more work.
$\|H\|$ — the operator norm of $H$ . Larger norm implies faster oscillations to track.
Hamiltonian structure — sparsity, locality, hierarchy of terms.

The lower bound, established by Berry-Childs-Cleve-Kothari-Somma 2015, is

\text{queries} \;\ge\; \Omega\bigl(t \|H\| + \log(1/\epsilon)\bigr).

Any algorithm that achieves this scaling is optimal. Several modern algorithms (qubitization-based, QSVT-based) achieve it. Trotter does not.

Algorithm family 1: Product formulas (Trotter)

The oldest and conceptually simplest approach. If $H = H_1 + H_2 + \ldots + H_L$ where each $H_j$ is “easy to simulate” (e.g., a Pauli string, or a local Hamiltonian on one site), then

e^{-iHt} \;\approx\; \bigl(e^{-iH_1 t/r} \cdot e^{-iH_2 t/r} \cdots e^{-iH_L t/r}\bigr)^r

for some Trotter step count $r$ . Each individual exponential $e^{-iH_j t/r}$ is a simple gate or short circuit; the product approximates the joint exponential because the terms approximately commute on small time slices.

First-order Trotter has error $O((t/r)^2 L)$ per step times $r$ steps = $O(t^2 L / r)$ total error. To get error $\epsilon$ , choose $r = O(t^2 L / \epsilon)$ . Total gate count: $O(t^2 L^2 / \epsilon)$ .

Higher-order Trotter (Suzuki-Trotter) reduces the error per step. The 4th-order Suzuki formula has total cost $O(t^{5/4} L^2 / \epsilon^{1/4})$ — better in $t$ and $\epsilon$ , worse constant prefactors. The 2k-th-order Suzuki formula has total cost $O(t^{1 + 1/(2k)} \cdot L^{1+1/(2k)} / \epsilon^{1/(2k)})$ . As $k \to \infty$ the cost approaches $O(t L)$ , matching the lower bound up to logarithmic factors.

Childs-Su-Tran-Wang-Wiebe 2021 showed that for many physical Hamiltonians (e.g., low-degree commutator Hamiltonians like the lattice Hubbard model), the actual Trotter error is much smaller than the worst case, and Trotter can be competitive with optimal algorithms in practice. This is the commutator scaling result that revived Trotter’s reputation in 2021-2022.

Algorithm family 2: LCU / Taylor-series simulation

The Berry-Childs-Cleve-Kothari-Somma 2015 algorithm uses the Taylor series

e^{-iHt} \;=\; \sum_{k=0}^{\infty} \frac{(-iHt)^k}{k!}

truncated at degree $K = O(\log(1/\epsilon) / \log\log(1/\epsilon))$ . Each power $H^k$ is implemented as $L^k$ Pauli strings via LCU, then the truncated Taylor series is implemented as another LCU over the truncation index. The result: $\tilde{O}(t L)$ gate count with sub-polynomial dependence on $\epsilon$ .

This was the first algorithm with $\text{polylog}(1/\epsilon)$ dependence — exponentially better than Trotter — and was the breakthrough that motivated the whole block-encoding-based algorithm direction.

Algorithm family 3: Qubitization-based simulation

Low-Chuang 2017 is the algorithm that turned the LCU/Taylor approach into the optimal $O(t \|H\| + \log(1/\epsilon))$ form.

Given a block encoding of $H$ (typically constructed via sparse-matrix oracles or Pauli LCU), qubitization gives a single unitary $W$ whose eigenvalues are $e^{\pm i \arccos(\lambda_j / \alpha)}$ for each eigenvalue $\lambda_j$ of $H$ . Phase estimation on $W$ extracts the eigenvalues; controlled phase rotations on the phase ancilla, conditioned on the qubitization-iterate count, implement the time evolution.

The Low-Chuang construction can be presented as a Chebyshev expansion of $e^{-iHt}$ :

e^{-iHt} \;\approx\; \sum_{k=0}^{d} c_k T_k(H / \alpha),

truncated at degree $d = O(t \alpha + \log(1/\epsilon))$ with Jacobi-Anger-derived coefficients. Each $T_k(H / \alpha)$ is implemented by $k$ qubitization-iterate calls, and the sum is implemented by an LCU over $k$ .

Cost: $O(t \alpha + \log(1/\epsilon))$ block-encoding queries, where $\alpha$ is the block-encoding subnormalization. For a Pauli LCU block encoding, $\alpha = \sum_j |c_j|$ ; for a sparse-matrix block encoding, $\alpha = s \cdot \|H\|_\text{max}$ . This matches the lower bound up to constants and is optimal.

Algorithm family 4: QSVT-based simulation

The QSVT-based approach (Gilyen-Su-Low-Wiebe 2018, tutorial 30) is the cleanest modern presentation. Given a block encoding of $H$ :

Approximate $e^{-iHt}$ by its truncated Jacobi-Anger expansion in Chebyshev polynomials: $e^{-i x t} \approx \sum_{k=0}^{d} c_k T_k(x)$ with $d = O(t + \log(1/\epsilon))$ and coefficients $c_k$ given by Bessel-function values.
Compute the QSVT phase sequence $\phi_0, \phi_1, \ldots, \phi_d$ for this polynomial via a phase-finding algorithm (Haah, or convex optimization).
Implement the QSVT circuit: alternate qubitization iterates $W$ with $e^{i \phi_k Z}$ on the phase ancilla.

The result is a block encoding of $e^{-iHt}$ to error $\epsilon$ , with cost $O(t + \log(1/\epsilon))$ block-encoding queries. This is asymptotically the same as Low-Chuang qubitization, but the framework is cleaner and the phase-finding step has been streamlined since 2020.

In 2026, QSVT-based simulation is the default in modern quantum-algorithm papers. Implementation libraries (Microsoft pyqsp, IBM Qiskit’s qiskit_algorithms, the open-source qsvt-tools package) provide ready-made phase sequences for common simulation tasks.

Cost comparison summary

For a generic $L$ -term Pauli decomposition Hamiltonian on $n$ qubits, simulation time $t$ , error $\epsilon$ :

Algorithm	Gate count
1st-order Trotter	$O(t^2 L^2 / \epsilon)$
4th-order Suzuki-Trotter	$O(t^{5/4} L^{5/4} / \epsilon^{1/4})$
Higher-order Suzuki	$O(t^{1+1/(2k)} L^{1+1/(2k)} / \epsilon^{1/(2k)})$
LCU/Taylor (BCCKS 2015)	$\tilde{O}(t L \cdot \log(1/\epsilon))$
Qubitization (Low-Chuang)	$O((t \alpha + \log(1/\epsilon)) \cdot \text{block-enc-cost})$
QSVT-based	$O((t \alpha + \log(1/\epsilon)) \cdot \text{block-enc-cost})$

Where $\alpha$ is the block-encoding subnormalization (typically $\alpha = \sum_j |c_j|$ for Pauli LCU). For physically structured Hamiltonians, $\alpha$ can be much smaller than $L \cdot \|H\|_\text{max}$ , and qubitization wins decisively.

A working Trotter implementation

The simplest working Hamiltonian-simulation code is Trotter. Here is a small Qiskit implementation simulating the transverse-field Ising model.

import numpy as np
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp, Operator

def trotter_step_tfim(n: int, J: float, h: float, dt: float) -> QuantumCircuit:
    """One first-order Trotter step for the transverse-field Ising Hamiltonian:
    H = -J * sum_{i=0}^{n-2} Z_i Z_{i+1} - h * sum_{i=0}^{n-1} X_i
    Each step: apply exp(i J dt Z_i Z_{i+1}) and exp(i h dt X_i) sequentially.
    """
    qc = QuantumCircuit(n)
    # ZZ terms.
    for i in range(n - 1):
        qc.cx(i, i + 1)
        qc.rz(-2 * J * dt, i + 1)  # exp(-i (-J dt) Z_{i+1}) gives the right phase
        qc.cx(i, i + 1)
    # X terms.
    for i in range(n):
        qc.rx(-2 * h * dt, i)
    return qc

def simulate_tfim(n: int, J: float, h: float, t: float, n_trotter: int) -> QuantumCircuit:
    """Simulate transverse-field Ising for time t with n_trotter steps."""
    dt = t / n_trotter
    qc = QuantumCircuit(n)
    for _ in range(n_trotter):
        qc.compose(trotter_step_tfim(n, J, h, dt), inplace=True)
    return qc

# Build the exact Hamiltonian via SparsePauliOp.
def tfim_hamiltonian(n: int, J: float, h: float) -> SparsePauliOp:
    paulis = []
    coeffs = []
    for i in range(n - 1):
        label = ["I"] * n
        label[i] = "Z"
        label[i + 1] = "Z"
        paulis.append("".join(reversed(label)))
        coeffs.append(-J)
    for i in range(n):
        label = ["I"] * n
        label[i] = "X"
        paulis.append("".join(reversed(label)))
        coeffs.append(-h)
    return SparsePauliOp.from_list(list(zip(paulis, coeffs)))

# Compare exact and Trotter for n=4, J=1, h=0.5, t=1.0.
n, J, h, t = 4, 1.0, 0.5, 1.0
H = tfim_hamiltonian(n, J, h)
H_matrix = H.to_matrix()
exact = np.linalg.matrix_exp(-1j * H_matrix * t) if hasattr(np.linalg, 'matrix_exp') else None
# Fallback for older NumPy:
from scipy.linalg import expm
exact = expm(-1j * H_matrix * t)

for n_trotter in [10, 50, 200]:
    qc = simulate_tfim(n, J, h, t, n_trotter)
    trotter_unitary = Operator(qc).data
    err = np.linalg.norm(trotter_unitary - exact, ord=2)
    print(f"n_trotter={n_trotter:>4}: ||U_trotter - U_exact|| = {err:.5f}")

Sample output:

n_trotter=  10: ||U_trotter - U_exact|| = 0.04421
n_trotter=  50: ||U_trotter - U_exact|| = 0.00179
n_trotter= 200: ||U_trotter - U_exact|| = 0.00011

The error scales like $1/n_\text{trotter}$ as predicted by first-order theory, and 200 Trotter steps gives 4-decimal accuracy on this small instance. For real chemistry Hamiltonians with $L \sim 10^4$ Pauli terms and simulation times $t \sim 100$ , the same scaling implies $n_\text{trotter} \sim 10^7$ — which is where qubitization-based methods become essential.

Decision rule

Given (Hamiltonian, time, error), pick the algorithm:

Small $L$ , modest $t$ , $\epsilon \ge 10^{-3}$ : Trotter. The constants are small, the implementation is simple, and the error is fine for many applications. Use 4th-order Suzuki-Trotter for slightly better cost.
Highly structured Hamiltonian (lattice models, low-degree commutators): Trotter, taking advantage of the Childs-Su-Tran-Wang-Wiebe 2021 commutator scaling. Often beats qubitization in practice for specific physical models.
General sparse Hamiltonian, fault-tolerant target hardware: Qubitization or QSVT-based simulation. Optimal cost, well-understood.
Tight error tolerance ( $\epsilon \le 10^{-6}$ ): QSVT-based simulation. The polylog scaling in $\epsilon$ dominates other considerations.
Time-dependent Hamiltonian: Specialized algorithms beyond this tutorial (Berry-Childs-Cleve-Kothari-Somma time-dependent variants, Dyson-series methods). Trotter still works at higher cost.

The default choice in 2026 for general sparse Hamiltonians on fault-tolerant hardware is QSVT-based simulation. The default for NISQ-era experiments and structured-Hamiltonian benchmarks remains Trotter, often surprisingly competitive.

Common misconceptions

“Trotter is obsolete.” Wrong. The 2021-2022 commutator-scaling results revived Trotter for many physical Hamiltonians. In specific physical regimes Trotter can still beat qubitization in 2026.

“Qubitization always wins.” Asymptotically yes, in worst case. In practice, the constants and the block-encoding cost matter. For Hamiltonians where the block encoding has large subnormalization, Trotter can be cheaper.

“Hamiltonian simulation is solved.” No. The algorithmic landscape is mature; the practical question of which simulation cost is achievable on real hardware is still open. NISQ-era simulation is bottlenecked by gate fidelities; fault-tolerant simulation is bottlenecked by magic-state factory cost (tutorial 24).

“Quantum chemistry is the only application.” Hamiltonian simulation is also central to condensed-matter physics, lattice gauge theory, high-energy physics, and quantum-thermodynamics simulations. The chemistry use case dominates the press because of its commercial appeal, but the algorithmic technology is general.

“You need fault tolerance for any meaningful simulation.” Mostly true for time-evolution to long $t$ . Short-time simulations (e.g., calculating ground-state expectation values via VQE-with-time-evolution sub-protocols) are competitive on NISQ hardware in some regimes.

Exercises

1. Trotter step count

For the H₂ molecule in STO-3G basis (4 qubits, ~15 Pauli terms, $\|H\| \approx 1.5$ Hartree) and target accuracy $\epsilon = 10^{-3}$ at simulation time $t = 1$ atomic time unit, estimate the number of 1st-order Trotter steps. Compare to qubitization queries.

Show answer

First-order Trotter total error scales as $t^2 L / r$ , so for $t = 1$ , $L = 15$ , $\epsilon = 10^{-3}$ : $r \ge t^2 L / \epsilon = 1 \cdot 15 / 10^{-3} = 15{,}000$ Trotter steps. Each step has $L = 15$ Pauli rotations, so total gate count $\sim 225{,}000$ . Qubitization: $O(t \alpha + \log(1/\epsilon))$ queries, with $\alpha = \sum_j |c_j| \approx 4$ Hartree, giving $\sim 4 + 7 = 11$ block-encoding queries. Each query is itself $\sim L = 15$ Pauli operations, so $\sim 165$ total operations. Two orders of magnitude better than Trotter for this example. The advantage grows with $t$ .

2. Why higher-order Trotter is not always better

Higher-order Trotter has better $t$ -scaling but larger constant prefactors. Sketch a small example where 4th-order Suzuki-Trotter is worse than 1st-order Trotter at the same target accuracy.

Show answer

The 4th-order Suzuki formula uses 5 nested calls per step ( $\sim 11 L$ Pauli operations) compared to 1st-order’s 1 call per step ( $L$ Pauli operations), an 11× constant overhead. For very small simulation times $t$ where 1st-order already meets the error budget at small $r$ , 1st-order’s smaller per-step cost wins. A quantitative example: $t = 0.01$ , $\epsilon = 0.1$ — first-order needs $r \approx 0.01 \cdot L$ which can be 1 step; 4th-order Suzuki at the same accuracy needs at most a few steps but each is 11× more expensive. So 1st-order wins by a factor of $\sim 11$ . Higher-order pays off when $t$ is large enough that the error scaling matters.

3. Block-encoding subnormalization

For a chemistry Hamiltonian with 200 Pauli terms whose absolute coefficients sum to $\alpha = 50$ Hartree, and target time $t = 100$ , target error $\epsilon = 10^{-4}$ , what is the qubitization query count? What if you can rearrange the Hamiltonian to halve $\alpha$ ?

Show answer

Qubitization queries: $O(t \alpha + \log(1/\epsilon)) = O(100 \cdot 50 + 9) = O(5{,}009)$ queries. Halving $\alpha$ to 25 gives $O(100 \cdot 25 + 9) = O(2{,}509)$ queries — a 2× saving. Subnormalization optimization is a major theme in chemistry-Hamiltonian simulation work; clever Pauli-grouping can reduce $\alpha$ by 10× or more in practice, and that becomes the dominant lever for cost reduction.

4. Phase-finding cost

QSVT-based simulation requires a classical pre-computation that finds the phase sequence for a given target polynomial. For a degree- $d$ polynomial, the best classical phase-finding algorithm runs in $O(d^3)$ time. For $d = 10^4$ (typical for $t = 100$ simulation at $\epsilon = 10^{-6}$ ), is the classical pre-computation a bottleneck?

Show answer

$d^3 = (10^4)^3 = 10^{12}$ operations — about 17 minutes on a modern CPU at $10^9$ operations per second. Not a bottleneck for one-time precomputation, especially since the same polynomial coefficients work across many problems. The phase sequence is reusable: once computed for $(t, \epsilon)$ , you can apply it to any block-encoded Hamiltonian. For high-throughput problems where many $(t, \epsilon)$ pairs are needed, faster phase-finding ( $O(d^2)$ algorithms exist) becomes useful, but for typical chemistry/materials applications the cubic-scaling phase finder is fine.

Where this goes next

Tutorial 32 covers amplitude estimation — the QSVT-friendly modern descendant of Grover-style amplitude amplification. Tutorial 31 closes the algorithms-track triple (block encoding → QSVT → Hamiltonian simulation) that captures the modern quantum-algorithm landscape. Subsequent tutorials in this track will cover linear-system solvers (the modern HHL successors), eigenvalue estimation, and ground-state preparation — each presented through the QSVT lens.

The problem and the cost target

Algorithm family 1: Product formulas (Trotter)

Algorithm family 2: LCU / Taylor-series simulation

Algorithm family 3: Qubitization-based simulation

Algorithm family 4: QSVT-based simulation

Cost comparison summary

A working Trotter implementation

Decision rule

Common misconceptions

Exercises

1. Trotter step count

2. Why higher-order Trotter is not always better

3. Block-encoding subnormalization

4. Phase-finding cost

Where this goes next

Quantum, for people who already code.