algorithms advanced · 16 min read · By LIPAI WANG · May 1, 2026

HHL: The Quantum Algorithm for Linear Systems and What Survived Dequantization

The Harrow-Hassidim-Lloyd algorithm (HHL, 2009) was the first quantum algorithm with exponential speedup over the best classical methods for solving sparse linear systems. It also became the most-discussed algorithm in machine learning, finance, and many applied domains — until Tang's 2018 dequantization revealed the exponential speedup depended critically on QRAM-like input access. This tutorial covers the algorithm, what HHL actually computes, the dequantization story, and the remaining regimes where HHL gives genuine speedups.

Prerequisites: Tutorial 11: QFT and Phase Estimation, Tutorial 41: Tang Dequantization

The Harrow-Hassidim-Lloyd algorithm (HHL, 2009) solves the linear system $A \boldsymbol{x} = \boldsymbol{b}$ on a quantum computer with runtime polylogarithmic in the matrix dimension. For large sparse matrices with classical algorithms requiring $O(N \cdot s)$ work (where $s$ is the matrix sparsity), HHL claimed an exponential speedup down to $O(\log N \cdot \text{poly}(\kappa, s, 1/\epsilon))$ — where $\kappa$ is the condition number.

For a few years, HHL was the most-cited “exponential speedup” algorithm outside of Shor’s. It powered claimed speedups in machine learning (recommender systems via Kerenidis-Prakash 2016), finance (option pricing), and engineering (fluid dynamics, structural analysis). Then in 2018, Tang’s dequantization (tutorial 41) showed that the input-access model HHL assumes — efficient amplitude-encoded $|b\rangle$ via QRAM — also enables classical sample-and-query algorithms to match many of HHL’s claimed speedups.

The post-dequantization picture: HHL gives a real speedup, but only in restricted regimes. Generic linear-systems problems on classical input data are dequantizable. Sparse, structured, or quantum-input problems retain quantum advantage. This tutorial covers the algorithm, what it actually computes, the dequantization caveat, and the surviving advantage regimes.

The problem and HHL’s claim

Linear systems: given a Hermitian matrix $A \in \mathbb{R}^{N \times N}$ and a vector $\boldsymbol{b} \in \mathbb{R}^N$ , compute $\boldsymbol{x} = A^{-1} \boldsymbol{b}$ .

Classical algorithms:

Direct (Gaussian elimination): $O(N^3)$ .
Iterative (conjugate gradient): $O(N \cdot s \cdot \kappa)$ for sparse $A$ with sparsity $s$ and condition number $\kappa$ .
For very large sparse matrices: still polynomial in $N$ .

HHL’s claim: $O(\log N \cdot \text{poly}(s, \kappa, 1/\epsilon))$ runtime — exponentially faster in $N$ , polynomial in the auxiliary parameters.

The asymmetry that makes this striking: $\log N$ is dramatically smaller than $N$ for any meaningful problem size. A $10^6 \times 10^6$ system is $20$ vs $10^6$ , a factor of $50{,}000$ speedup if all the auxiliary parameters are reasonable. This was the dream that motivated a decade of QML research.

What HHL actually computes

HHL does not give you $\boldsymbol{x}$ as classical numbers. It produces the amplitude-encoded quantum state $|x\rangle = A^{-1} |b\rangle / \|A^{-1} |b\rangle\|$ — a superposition with amplitudes proportional to the components of $\boldsymbol{x}$ .

To use this state, you can:

Measure observables of $\boldsymbol{x}$ : e.g., $\langle x | M | x \rangle$ for some efficient $M$ . Useful for many applications (estimate the largest component, compute expectation of a known operator).
Use $|x\rangle$ as input to another quantum algorithm. E.g., feed it into a quantum classifier or another HHL.
Read out classical samples of components: but this loses the speedup if you need many components, since each measurement gives one classical sample at $1/\sqrt{N}$ precision.

The third point is critical. You cannot read out $\boldsymbol{x}$ as a classical vector in $O(\log N)$ time. The quantum algorithm gives you the state; converting to classical numbers is a separate, expensive step. This output-access bottleneck is one of the structural reasons HHL is not a drop-in replacement for classical linear-systems solvers.

The algorithm structure

HHL’s three-step structure:

Step 1: Hamiltonian simulation

Treat $A$ as a Hamiltonian. Apply $e^{i A t}$ for various times $t$ . This requires that $A$ be implementable as a Hamiltonian (efficient quantum-circuit access to its action on a state), which is the input-access assumption.

Step 2: Phase estimation

Use quantum phase estimation (tutorial 11) on $e^{i A t}$ applied to $|b\rangle$ . This produces a register containing $|\lambda_k\rangle$ — the eigenvalues of $A$ — entangled with the eigenvectors of $A$ in the $|b\rangle$ decomposition.

Step 3: Conditional rotation

For each eigenvalue $\lambda_k$ , apply a controlled rotation that turns $\lambda_k$ into $1/\lambda_k$ on an ancilla qubit. Then the quantum state is approximately $|x\rangle = A^{-1}|b\rangle$ (after suitable amplitude amplification on the ancilla).

The total cost is $O(\log N \cdot \kappa^2 \cdot s^2 / \epsilon^3)$ in the original analysis, with later improvements bringing this down to $O(\log N \cdot \kappa \cdot s \cdot \log(1/\epsilon))$ via QSVT-based methods (tutorial 30).

The condition number bottleneck

The polynomial dependence on $\kappa$ is structural. If $A$ has very small eigenvalues (large condition number), the conditional rotation step has small amplitudes — amplitude amplification needs many rounds, and the algorithm’s runtime scales accordingly.

For ill-conditioned matrices (typical of many real-world problems), $\kappa$ can be very large. A $\kappa = 10^6$ matrix gives quantum runtime $O(\log N \cdot 10^6) \approx 2 \times 10^7$ , comparable to or worse than classical iterative methods. HHL’s exponential speedup in $N$ is real but contingent on $\kappa$ being modest. Many practical problems do not satisfy this.

The dequantization caveat

Tutorial 41 covered Tang’s 2018 result: with QRAM-style amplitude-encoded input access to $\boldsymbol{b}$ , classical sample-and-query algorithms can solve some related problems in time polynomial in $N$ , $\kappa$ , $s$ , and $1/\epsilon$ . For low-rank approximations and certain sparse problems, the classical algorithm matches HHL up to polynomial factors.

The implication for HHL specifically:

Low-rank dense problems: dequantizable. HHL’s speedup is illusory.
Sparse, well-conditioned problems with quantum-natural input: HHL retains the speedup.
High-rank dense problems: HHL’s input cost (loading $\boldsymbol{b}$ via QRAM, $O(N)$ time) eats the exponential speedup unless QRAM is amortized.

The post-dequantization rule of thumb: HHL is an exponential speedup over classical algorithms only if (a) the input has structure (sparse, low-rank, quantum-native) AND (b) you don’t need to read out the full classical solution AND (c) you can build the QRAM efficiently OR your application uses the output as a quantum state.

What HHL is now used for

Despite the dequantization story, HHL retains strong arguments for several applications:

Hamiltonian-simulation chemistry

Computing ground-state expectations of molecular Hamiltonians via linear-systems-style algorithms. The Hamiltonian itself defines the input matrix; no QRAM loading is needed. The output is a quantum state that can be measured directly.

Quantum-input regression

Regression problems where the data is itself a quantum state — outputs of a quantum simulator, sensor readings encoded in superposition. SQ access doesn’t apply because the data isn’t classical.

Variational warm starts

Use HHL to generate an initial guess for a variational quantum algorithm. The HHL output is a quantum state; the variational algorithm refines it. The HHL step is just a fast initialization, not the algorithm proper.

QSVT-based algorithm primitives

The modern HHL variant uses QSVT (tutorial 30) instead of phase estimation, with much better gate-count scaling. The QSVT approach is the production choice in 2026 for algorithms that need linear-systems-style operations as a subroutine.

A small HHL-style example

Concrete code that uses block-encoding to apply $A^{-1}$ to a state, in the QSVT-based HHL flavor:

import numpy as np
import pennylane as qml

# A small (4x4) well-conditioned matrix.
A = np.array([
    [2.0, 0.5, 0.0, 0.0],
    [0.5, 3.0, 1.0, 0.0],
    [0.0, 1.0, 2.5, 0.5],
    [0.0, 0.0, 0.5, 2.0]
])
b = np.array([1.0, 1.0, 1.0, 1.0])
b = b / np.linalg.norm(b)

# Classical solution.
x_classical = np.linalg.solve(A, b)
x_classical = x_classical / np.linalg.norm(x_classical)
print(f"Classical solution (normalized): {x_classical}")

# In a real HHL implementation, the matrix A would be block-encoded.
# Then QSVT applies a polynomial approximation to 1/x on the singular values of A.
# Here we directly compute the polynomial form for illustration.

# Simplified: condition number of A.
eigenvalues = np.linalg.eigvalsh(A)
kappa = max(eigenvalues) / min(eigenvalues)
print(f"Condition number: {kappa:.2f}")
print(f"Polynomial degree needed for 1/x approx: ~{int(np.log(kappa))}")
print(f"HHL gate count: O(log({A.shape[0]}) * kappa * log(1/epsilon)) = O({int(np.log2(A.shape[0]) * kappa * 5)})")

The HHL workflow in production:

Block-encode $A$ (tutorial 29).
Construct a polynomial approximation to $1/x$ over the singular values of $A$ .
Apply the polynomial via QSVT (tutorial 30).
The result is the amplitude-encoded inverse-applied state.

The QSVT-based HHL is dramatically faster than the original phase-estimation HHL — the constant factors and asymptotic scaling are both better.

Common misconceptions

“HHL gives exponential speedup for any linear system.” False. The speedup requires sparse + well-conditioned + quantum-input + quantum-output regimes simultaneously. Generic dense large- $\kappa$ problems are NOT exponentially sped up.

“HHL is dead because of dequantization.” Partially. Specific applications that motivated HHL (Kerenidis-Prakash recommendations, generic ML applications) are dequantizable. Other applications (Hamiltonian-grounded chemistry, quantum-input regression) survive.

“HHL is a black-box quantum primitive.” It’s not — implementing HHL requires real engineering decisions about input encoding, polynomial approximation, condition-number budget, etc. The QSVT-based variant has different engineering challenges than the original phase-estimation one.

“HHL is competing with classical solvers in 2026.” It’s not yet — small-scale HHL demos exist, but HHL on real-world ill-conditioned matrices is far from beating tuned classical iterative solvers in production. The competition is on specific structured problems where the structural advantage is provable.

Decision rule

Use HHL or HHL-derived algorithms when:

The input has structure that classical SQ doesn’t capture. Quantum-native input or specifically structured sparse problems.
The output is consumed as a quantum state. No need to read out classical components.
The condition number is modest. $\kappa$ in the hundreds, not millions.
You’re implementing a fault-tolerant subroutine. HHL is a structural primitive in qubitization-based algorithms.

Don’t use HHL when:

You need classical solution components. Output bottleneck eats the speedup.
The matrix is large and dense without QRAM. Input loading dominates.
You’re solving a problem where Tang-style dequantization applies. Run a classical algorithm with comparable parameters first.

For most production linear-algebra workflows in 2026, HHL is not the right tool. For structural quantum-algorithm research and for fault-tolerant primitives, HHL-derived techniques are central.

Exercises

1. Why output-readout dominates

You compute $|x\rangle = A^{-1} |b\rangle$ via HHL in $O(\log N)$ time. To read out all $N$ components classically, what is the total runtime?

Show answer

Reading out one component requires measurement, costing $1/\sqrt{N}$ precision per measurement. To get $\epsilon$ -precision on each component requires $1/\epsilon^2$ measurements. To get $N$ components: $N \times 1/\epsilon^2$ measurements. The full classical readout takes $O(N / \epsilon^2)$ measurements, which is much worse than the $O(\log N)$ HHL runtime — and worse than a classical algorithm to begin with. HHL’s speedup only applies to applications where you don’t need full classical readout. This is one of the most important misunderstandings about HHL.

2. Condition number and runtime

A real-world linear system has $N = 10^9$ and $\kappa = 10^4$ . Compare HHL’s runtime against classical conjugate gradient.

Show answer

HHL: $O(\log N \cdot \kappa) = O(30 \cdot 10^4) = 3 \times 10^5$ . Classical CG: $O(N \cdot s \cdot \sqrt{\kappa})$ for sparse case ≈ $O(10^9 \cdot 10 \cdot 100) = 10^{12}$ for $s = 10$ . HHL is 7 orders of magnitude faster. For sparse well-conditioned systems, HHL’s speedup is real even at modest $\kappa$ . The catch is: this is quantum runtime, requiring the quantum machine to exist. With current 2026 hardware (no fault-tolerant quantum computer), running HHL on this system size is impossible. Classical CG runs now.

3. The dequantization match

A 2024 paper claims HHL achieves $O(\log N)$ for a structured matrix-inversion problem. Tang-style dequantization gives $O(\text{poly}(\log N))$ for the same problem with classical SQ access. Is the quantum speedup meaningful?

Show answer

If both are polylogarithmic in $N$ but the quantum is $O(\log N)$ and classical is $O((\log N)^k)$ for some $k > 1$ , the speedup is polynomial in $\log N$ — not exponential, not nothing. Useful but not transformative. The honest description would be “polynomial speedup over classical SQ” rather than “exponential speedup over classical.” Many post-2018 HHL claims have been recharacterized this way as the dequantization implications were absorbed.

4. HHL as a subroutine

A larger algorithm uses HHL as a subroutine, computing $A^{-1} |b\rangle$ to use in a downstream quantum step. The downstream step is itself BQP-complete. Does HHL retain its speedup in this context?

Show answer

Yes. The downstream step is a quantum operation with no classical SQ analog (BQP-complete subroutines are not dequantizable). The HHL output is consumed quantumly without classical readout. The full algorithm enjoys the quantum speedup, with HHL contributing logarithmic-in- $N$ matrix-inversion subroutines. This is the structurally cleanest application of HHL: as part of a fault-tolerant pipeline that uses quantum states throughout. HHL as a quantum-pipeline primitive survives dequantization fully. The speedup in pure quantum algorithms is intact; it’s only the classical-input/classical-output framing that is dequantizable.

Where this goes next

Tutorial 59 covers LCU (Linear Combination of Unitaries) — the framework that powers HHL, qubitization, and modern Hamiltonian simulation. Tutorial 60 covers quantum phase estimation precision analysis, the technical workhorse of HHL and many related algorithms.

The problem and HHL’s claim

What HHL actually computes

The algorithm structure

Step 1: Hamiltonian simulation

Step 2: Phase estimation

Step 3: Conditional rotation

The condition number bottleneck

The dequantization caveat

What HHL is now used for

Hamiltonian-simulation chemistry

Quantum-input regression

Variational warm starts

QSVT-based algorithm primitives

A small HHL-style example

Common misconceptions

Decision rule

Exercises

1. Why output-readout dominates

2. Condition number and runtime

3. The dequantization match

4. HHL as a subroutine

Where this goes next

Quantum, for people who already code.