Hamiltonian Simulation: From Trotter to Qubitization, the Modern Picture
Hamiltonian simulation — computing the quantum dynamics e^(-iHt) for a chemistry, materials, or condensed-matter Hamiltonian — is the original quantum-computing application Feynman pitched in 1981 and arguably the application most likely to deliver real value first. This tutorial covers the four standard algorithm families (product formulas, LCU/Taylor series, qubitization, QSVT-based simulation), their cost scalings, and when to reach for each.
Prerequisites: Tutorial 29: Block Encoding, Tutorial 30: Quantum Singular Value Transformation
When Feynman proposed quantum computing in 1981, he had one specific application in mind: simulating quantum mechanics. Classical computers struggle to simulate -qubit quantum systems because the state vector lives in a -dimensional space. Feynman’s pitch was that a quantum machine, being itself -dimensional in the same way, would have no such bottleneck.
Forty-five years later, Hamiltonian simulation — computing for a target Hamiltonian and a target time — is still the application most likely to produce useful quantum advantage first. It is the killer app for quantum chemistry, condensed-matter physics, materials science, and high-energy physics simulation. It also has the cleanest theoretical case for why classical algorithms are inadequate: many physical Hamiltonians have explicit structure that quantum simulation exploits and classical simulation cannot.
This tutorial covers the four algorithm families used in 2026: product formulas (Trotter), LCU and Taylor-series methods, qubitization-based simulation, and QSVT-based simulation. Each has different cost scaling in time, error, sparsity, and Hamiltonian norm. Pick the right one for your problem; getting this wrong is one of the most expensive mistakes in algorithm design.
The problem and the cost target
Given a Hamiltonian acting on qubits, a time , and an error tolerance , we want a quantum circuit that approximates the unitary to within in operator norm. The cost target is the gate complexity of the circuit, often measured in T-gate count for fault-tolerant analyses or in two-qubit gate count for NISQ-era work.
The complexity scales with several parameters:
- — the simulation time. Longer time implies more work.
- — the error tolerance. Smaller error implies more work.
- — the operator norm of . Larger norm implies faster oscillations to track.
- Hamiltonian structure — sparsity, locality, hierarchy of terms.
The lower bound, established by Berry-Childs-Cleve-Kothari-Somma 2015, is
Any algorithm that achieves this scaling is optimal. Several modern algorithms (qubitization-based, QSVT-based) achieve it. Trotter does not.
Algorithm family 1: Product formulas (Trotter)
The oldest and conceptually simplest approach. If where each is “easy to simulate” (e.g., a Pauli string, or a local Hamiltonian on one site), then
for some Trotter step count . Each individual exponential is a simple gate or short circuit; the product approximates the joint exponential because the terms approximately commute on small time slices.
First-order Trotter has error per step times steps = total error. To get error , choose . Total gate count: .
Higher-order Trotter (Suzuki-Trotter) reduces the error per step. The 4th-order Suzuki formula has total cost — better in and , worse constant prefactors. The 2k-th-order Suzuki formula has total cost . As the cost approaches , matching the lower bound up to logarithmic factors.
Childs-Su-Tran-Wang-Wiebe 2021 showed that for many physical Hamiltonians (e.g., low-degree commutator Hamiltonians like the lattice Hubbard model), the actual Trotter error is much smaller than the worst case, and Trotter can be competitive with optimal algorithms in practice. This is the commutator scaling result that revived Trotter’s reputation in 2021-2022.
Algorithm family 2: LCU / Taylor-series simulation
The Berry-Childs-Cleve-Kothari-Somma 2015 algorithm uses the Taylor series
truncated at degree . Each power is implemented as Pauli strings via LCU, then the truncated Taylor series is implemented as another LCU over the truncation index. The result: gate count with sub-polynomial dependence on .
This was the first algorithm with dependence — exponentially better than Trotter — and was the breakthrough that motivated the whole block-encoding-based algorithm direction.
Algorithm family 3: Qubitization-based simulation
Low-Chuang 2017 is the algorithm that turned the LCU/Taylor approach into the optimal form.
Given a block encoding of (typically constructed via sparse-matrix oracles or Pauli LCU), qubitization gives a single unitary whose eigenvalues are for each eigenvalue of . Phase estimation on extracts the eigenvalues; controlled phase rotations on the phase ancilla, conditioned on the qubitization-iterate count, implement the time evolution.
The Low-Chuang construction can be presented as a Chebyshev expansion of :
truncated at degree with Jacobi-Anger-derived coefficients. Each is implemented by qubitization-iterate calls, and the sum is implemented by an LCU over .
Cost: block-encoding queries, where is the block-encoding subnormalization. For a Pauli LCU block encoding, ; for a sparse-matrix block encoding, . This matches the lower bound up to constants and is optimal.
Algorithm family 4: QSVT-based simulation
The QSVT-based approach (Gilyen-Su-Low-Wiebe 2018, tutorial 30) is the cleanest modern presentation. Given a block encoding of :
- Approximate by its truncated Jacobi-Anger expansion in Chebyshev polynomials: with and coefficients given by Bessel-function values.
- Compute the QSVT phase sequence for this polynomial via a phase-finding algorithm (Haah, or convex optimization).
- Implement the QSVT circuit: alternate qubitization iterates with on the phase ancilla.
The result is a block encoding of to error , with cost block-encoding queries. This is asymptotically the same as Low-Chuang qubitization, but the framework is cleaner and the phase-finding step has been streamlined since 2020.
In 2026, QSVT-based simulation is the default in modern quantum-algorithm papers. Implementation libraries (Microsoft pyqsp, IBM Qiskit’s qiskit_algorithms, the open-source qsvt-tools package) provide ready-made phase sequences for common simulation tasks.
Cost comparison summary
For a generic -term Pauli decomposition Hamiltonian on qubits, simulation time , error :
| Algorithm | Gate count |
|---|---|
| 1st-order Trotter | |
| 4th-order Suzuki-Trotter | |
| Higher-order Suzuki | |
| LCU/Taylor (BCCKS 2015) | |
| Qubitization (Low-Chuang) | |
| QSVT-based |
Where is the block-encoding subnormalization (typically for Pauli LCU). For physically structured Hamiltonians, can be much smaller than , and qubitization wins decisively.
A working Trotter implementation
The simplest working Hamiltonian-simulation code is Trotter. Here is a small Qiskit implementation simulating the transverse-field Ising model.
import numpy as np
from qiskit import QuantumCircuit
from qiskit.quantum_info import SparsePauliOp, Operator
def trotter_step_tfim(n: int, J: float, h: float, dt: float) -> QuantumCircuit:
"""One first-order Trotter step for the transverse-field Ising Hamiltonian:
H = -J * sum_{i=0}^{n-2} Z_i Z_{i+1} - h * sum_{i=0}^{n-1} X_i
Each step: apply exp(i J dt Z_i Z_{i+1}) and exp(i h dt X_i) sequentially.
"""
qc = QuantumCircuit(n)
# ZZ terms.
for i in range(n - 1):
qc.cx(i, i + 1)
qc.rz(-2 * J * dt, i + 1) # exp(-i (-J dt) Z_{i+1}) gives the right phase
qc.cx(i, i + 1)
# X terms.
for i in range(n):
qc.rx(-2 * h * dt, i)
return qc
def simulate_tfim(n: int, J: float, h: float, t: float, n_trotter: int) -> QuantumCircuit:
"""Simulate transverse-field Ising for time t with n_trotter steps."""
dt = t / n_trotter
qc = QuantumCircuit(n)
for _ in range(n_trotter):
qc.compose(trotter_step_tfim(n, J, h, dt), inplace=True)
return qc
# Build the exact Hamiltonian via SparsePauliOp.
def tfim_hamiltonian(n: int, J: float, h: float) -> SparsePauliOp:
paulis = []
coeffs = []
for i in range(n - 1):
label = ["I"] * n
label[i] = "Z"
label[i + 1] = "Z"
paulis.append("".join(reversed(label)))
coeffs.append(-J)
for i in range(n):
label = ["I"] * n
label[i] = "X"
paulis.append("".join(reversed(label)))
coeffs.append(-h)
return SparsePauliOp.from_list(list(zip(paulis, coeffs)))
# Compare exact and Trotter for n=4, J=1, h=0.5, t=1.0.
n, J, h, t = 4, 1.0, 0.5, 1.0
H = tfim_hamiltonian(n, J, h)
H_matrix = H.to_matrix()
exact = np.linalg.matrix_exp(-1j * H_matrix * t) if hasattr(np.linalg, 'matrix_exp') else None
# Fallback for older NumPy:
from scipy.linalg import expm
exact = expm(-1j * H_matrix * t)
for n_trotter in [10, 50, 200]:
qc = simulate_tfim(n, J, h, t, n_trotter)
trotter_unitary = Operator(qc).data
err = np.linalg.norm(trotter_unitary - exact, ord=2)
print(f"n_trotter={n_trotter:>4}: ||U_trotter - U_exact|| = {err:.5f}")
Sample output:
n_trotter= 10: ||U_trotter - U_exact|| = 0.04421
n_trotter= 50: ||U_trotter - U_exact|| = 0.00179
n_trotter= 200: ||U_trotter - U_exact|| = 0.00011
The error scales like as predicted by first-order theory, and 200 Trotter steps gives 4-decimal accuracy on this small instance. For real chemistry Hamiltonians with Pauli terms and simulation times , the same scaling implies — which is where qubitization-based methods become essential.
Decision rule
Given (Hamiltonian, time, error), pick the algorithm:
- Small , modest , : Trotter. The constants are small, the implementation is simple, and the error is fine for many applications. Use 4th-order Suzuki-Trotter for slightly better cost.
- Highly structured Hamiltonian (lattice models, low-degree commutators): Trotter, taking advantage of the Childs-Su-Tran-Wang-Wiebe 2021 commutator scaling. Often beats qubitization in practice for specific physical models.
- General sparse Hamiltonian, fault-tolerant target hardware: Qubitization or QSVT-based simulation. Optimal cost, well-understood.
- Tight error tolerance (): QSVT-based simulation. The polylog scaling in dominates other considerations.
- Time-dependent Hamiltonian: Specialized algorithms beyond this tutorial (Berry-Childs-Cleve-Kothari-Somma time-dependent variants, Dyson-series methods). Trotter still works at higher cost.
The default choice in 2026 for general sparse Hamiltonians on fault-tolerant hardware is QSVT-based simulation. The default for NISQ-era experiments and structured-Hamiltonian benchmarks remains Trotter, often surprisingly competitive.
Common misconceptions
“Trotter is obsolete.” Wrong. The 2021-2022 commutator-scaling results revived Trotter for many physical Hamiltonians. In specific physical regimes Trotter can still beat qubitization in 2026.
“Qubitization always wins.” Asymptotically yes, in worst case. In practice, the constants and the block-encoding cost matter. For Hamiltonians where the block encoding has large subnormalization, Trotter can be cheaper.
“Hamiltonian simulation is solved.” No. The algorithmic landscape is mature; the practical question of which simulation cost is achievable on real hardware is still open. NISQ-era simulation is bottlenecked by gate fidelities; fault-tolerant simulation is bottlenecked by magic-state factory cost (tutorial 24).
“Quantum chemistry is the only application.” Hamiltonian simulation is also central to condensed-matter physics, lattice gauge theory, high-energy physics, and quantum-thermodynamics simulations. The chemistry use case dominates the press because of its commercial appeal, but the algorithmic technology is general.
“You need fault tolerance for any meaningful simulation.” Mostly true for time-evolution to long . Short-time simulations (e.g., calculating ground-state expectation values via VQE-with-time-evolution sub-protocols) are competitive on NISQ hardware in some regimes.
Exercises
1. Trotter step count
For the H₂ molecule in STO-3G basis (4 qubits, ~15 Pauli terms, Hartree) and target accuracy at simulation time atomic time unit, estimate the number of 1st-order Trotter steps. Compare to qubitization queries.
Show answer
First-order Trotter total error scales as , so for , , : Trotter steps. Each step has Pauli rotations, so total gate count . Qubitization: queries, with Hartree, giving block-encoding queries. Each query is itself Pauli operations, so total operations. Two orders of magnitude better than Trotter for this example. The advantage grows with .
2. Why higher-order Trotter is not always better
Higher-order Trotter has better -scaling but larger constant prefactors. Sketch a small example where 4th-order Suzuki-Trotter is worse than 1st-order Trotter at the same target accuracy.
Show answer
The 4th-order Suzuki formula uses 5 nested calls per step ( Pauli operations) compared to 1st-order’s 1 call per step ( Pauli operations), an 11× constant overhead. For very small simulation times where 1st-order already meets the error budget at small , 1st-order’s smaller per-step cost wins. A quantitative example: , — first-order needs which can be 1 step; 4th-order Suzuki at the same accuracy needs at most a few steps but each is 11× more expensive. So 1st-order wins by a factor of . Higher-order pays off when is large enough that the error scaling matters.
3. Block-encoding subnormalization
For a chemistry Hamiltonian with 200 Pauli terms whose absolute coefficients sum to Hartree, and target time , target error , what is the qubitization query count? What if you can rearrange the Hamiltonian to halve ?
Show answer
Qubitization queries: queries. Halving to 25 gives queries — a 2× saving. Subnormalization optimization is a major theme in chemistry-Hamiltonian simulation work; clever Pauli-grouping can reduce by 10× or more in practice, and that becomes the dominant lever for cost reduction.
4. Phase-finding cost
QSVT-based simulation requires a classical pre-computation that finds the phase sequence for a given target polynomial. For a degree- polynomial, the best classical phase-finding algorithm runs in time. For (typical for simulation at ), is the classical pre-computation a bottleneck?
Show answer
operations — about 17 minutes on a modern CPU at operations per second. Not a bottleneck for one-time precomputation, especially since the same polynomial coefficients work across many problems. The phase sequence is reusable: once computed for , you can apply it to any block-encoded Hamiltonian. For high-throughput problems where many pairs are needed, faster phase-finding ( algorithms exist) becomes useful, but for typical chemistry/materials applications the cubic-scaling phase finder is fine.
Where this goes next
Tutorial 32 covers amplitude estimation — the QSVT-friendly modern descendant of Grover-style amplitude amplification. Tutorial 31 closes the algorithms-track triple (block encoding → QSVT → Hamiltonian simulation) that captures the modern quantum-algorithm landscape. Subsequent tutorials in this track will cover linear-system solvers (the modern HHL successors), eigenvalue estimation, and ground-state preparation — each presented through the QSVT lens.