Quantum Outpost
algorithms advanced · 16 min read ·

HHL: The Quantum Algorithm for Linear Systems and What Survived Dequantization

The Harrow-Hassidim-Lloyd algorithm (HHL, 2009) was the first quantum algorithm with exponential speedup over the best classical methods for solving sparse linear systems. It also became the most-discussed algorithm in machine learning, finance, and many applied domains — until Tang's 2018 dequantization revealed the exponential speedup depended critically on QRAM-like input access. This tutorial covers the algorithm, what HHL actually computes, the dequantization story, and the remaining regimes where HHL gives genuine speedups.

Prerequisites: Tutorial 11: QFT and Phase Estimation, Tutorial 41: Tang Dequantization

The Harrow-Hassidim-Lloyd algorithm (HHL, 2009) solves the linear system Ax=bA \boldsymbol{x} = \boldsymbol{b} on a quantum computer with runtime polylogarithmic in the matrix dimension. For large sparse matrices with classical algorithms requiring O(Ns)O(N \cdot s) work (where ss is the matrix sparsity), HHL claimed an exponential speedup down to O(logNpoly(κ,s,1/ϵ))O(\log N \cdot \text{poly}(\kappa, s, 1/\epsilon)) — where κ\kappa is the condition number.

For a few years, HHL was the most-cited “exponential speedup” algorithm outside of Shor’s. It powered claimed speedups in machine learning (recommender systems via Kerenidis-Prakash 2016), finance (option pricing), and engineering (fluid dynamics, structural analysis). Then in 2018, Tang’s dequantization (tutorial 41) showed that the input-access model HHL assumes — efficient amplitude-encoded b|b\rangle via QRAM — also enables classical sample-and-query algorithms to match many of HHL’s claimed speedups.

The post-dequantization picture: HHL gives a real speedup, but only in restricted regimes. Generic linear-systems problems on classical input data are dequantizable. Sparse, structured, or quantum-input problems retain quantum advantage. This tutorial covers the algorithm, what it actually computes, the dequantization caveat, and the surviving advantage regimes.

The problem and HHL’s claim

Linear systems: given a Hermitian matrix ARN×NA \in \mathbb{R}^{N \times N} and a vector bRN\boldsymbol{b} \in \mathbb{R}^N, compute x=A1b\boldsymbol{x} = A^{-1} \boldsymbol{b}.

Classical algorithms:

  • Direct (Gaussian elimination): O(N3)O(N^3).
  • Iterative (conjugate gradient): O(Nsκ)O(N \cdot s \cdot \kappa) for sparse AA with sparsity ss and condition number κ\kappa.
  • For very large sparse matrices: still polynomial in NN.

HHL’s claim: O(logNpoly(s,κ,1/ϵ))O(\log N \cdot \text{poly}(s, \kappa, 1/\epsilon)) runtime — exponentially faster in NN, polynomial in the auxiliary parameters.

The asymmetry that makes this striking: logN\log N is dramatically smaller than NN for any meaningful problem size. A 106×10610^6 \times 10^6 system is 2020 vs 10610^6, a factor of 50,00050{,}000 speedup if all the auxiliary parameters are reasonable. This was the dream that motivated a decade of QML research.

What HHL actually computes

HHL does not give you x\boldsymbol{x} as classical numbers. It produces the amplitude-encoded quantum state x=A1b/A1b|x\rangle = A^{-1} |b\rangle / \|A^{-1} |b\rangle\| — a superposition with amplitudes proportional to the components of x\boldsymbol{x}.

To use this state, you can:

  1. Measure observables of x\boldsymbol{x}: e.g., xMx\langle x | M | x \rangle for some efficient MM. Useful for many applications (estimate the largest component, compute expectation of a known operator).
  2. Use x|x\rangle as input to another quantum algorithm. E.g., feed it into a quantum classifier or another HHL.
  3. Read out classical samples of components: but this loses the speedup if you need many components, since each measurement gives one classical sample at 1/N1/\sqrt{N} precision.

The third point is critical. You cannot read out x\boldsymbol{x} as a classical vector in O(logN)O(\log N) time. The quantum algorithm gives you the state; converting to classical numbers is a separate, expensive step. This output-access bottleneck is one of the structural reasons HHL is not a drop-in replacement for classical linear-systems solvers.

The algorithm structure

HHL’s three-step structure:

Step 1: Hamiltonian simulation

Treat AA as a Hamiltonian. Apply eiAte^{i A t} for various times tt. This requires that AA be implementable as a Hamiltonian (efficient quantum-circuit access to its action on a state), which is the input-access assumption.

Step 2: Phase estimation

Use quantum phase estimation (tutorial 11) on eiAte^{i A t} applied to b|b\rangle. This produces a register containing λk|\lambda_k\rangle — the eigenvalues of AA — entangled with the eigenvectors of AA in the b|b\rangle decomposition.

Step 3: Conditional rotation

For each eigenvalue λk\lambda_k, apply a controlled rotation that turns λk\lambda_k into 1/λk1/\lambda_k on an ancilla qubit. Then the quantum state is approximately x=A1b|x\rangle = A^{-1}|b\rangle (after suitable amplitude amplification on the ancilla).

The total cost is O(logNκ2s2/ϵ3)O(\log N \cdot \kappa^2 \cdot s^2 / \epsilon^3) in the original analysis, with later improvements bringing this down to O(logNκslog(1/ϵ))O(\log N \cdot \kappa \cdot s \cdot \log(1/\epsilon)) via QSVT-based methods (tutorial 30).

The condition number bottleneck

The polynomial dependence on κ\kappa is structural. If AA has very small eigenvalues (large condition number), the conditional rotation step has small amplitudes — amplitude amplification needs many rounds, and the algorithm’s runtime scales accordingly.

For ill-conditioned matrices (typical of many real-world problems), κ\kappa can be very large. A κ=106\kappa = 10^6 matrix gives quantum runtime O(logN106)2×107O(\log N \cdot 10^6) \approx 2 \times 10^7, comparable to or worse than classical iterative methods. HHL’s exponential speedup in NN is real but contingent on κ\kappa being modest. Many practical problems do not satisfy this.

The dequantization caveat

Tutorial 41 covered Tang’s 2018 result: with QRAM-style amplitude-encoded input access to b\boldsymbol{b}, classical sample-and-query algorithms can solve some related problems in time polynomial in NN, κ\kappa, ss, and 1/ϵ1/\epsilon. For low-rank approximations and certain sparse problems, the classical algorithm matches HHL up to polynomial factors.

The implication for HHL specifically:

  • Low-rank dense problems: dequantizable. HHL’s speedup is illusory.
  • Sparse, well-conditioned problems with quantum-natural input: HHL retains the speedup.
  • High-rank dense problems: HHL’s input cost (loading b\boldsymbol{b} via QRAM, O(N)O(N) time) eats the exponential speedup unless QRAM is amortized.

The post-dequantization rule of thumb: HHL is an exponential speedup over classical algorithms only if (a) the input has structure (sparse, low-rank, quantum-native) AND (b) you don’t need to read out the full classical solution AND (c) you can build the QRAM efficiently OR your application uses the output as a quantum state.

What HHL is now used for

Despite the dequantization story, HHL retains strong arguments for several applications:

Hamiltonian-simulation chemistry

Computing ground-state expectations of molecular Hamiltonians via linear-systems-style algorithms. The Hamiltonian itself defines the input matrix; no QRAM loading is needed. The output is a quantum state that can be measured directly.

Quantum-input regression

Regression problems where the data is itself a quantum state — outputs of a quantum simulator, sensor readings encoded in superposition. SQ access doesn’t apply because the data isn’t classical.

Variational warm starts

Use HHL to generate an initial guess for a variational quantum algorithm. The HHL output is a quantum state; the variational algorithm refines it. The HHL step is just a fast initialization, not the algorithm proper.

QSVT-based algorithm primitives

The modern HHL variant uses QSVT (tutorial 30) instead of phase estimation, with much better gate-count scaling. The QSVT approach is the production choice in 2026 for algorithms that need linear-systems-style operations as a subroutine.

A small HHL-style example

Concrete code that uses block-encoding to apply A1A^{-1} to a state, in the QSVT-based HHL flavor:

import numpy as np
import pennylane as qml

# A small (4x4) well-conditioned matrix.
A = np.array([
    [2.0, 0.5, 0.0, 0.0],
    [0.5, 3.0, 1.0, 0.0],
    [0.0, 1.0, 2.5, 0.5],
    [0.0, 0.0, 0.5, 2.0]
])
b = np.array([1.0, 1.0, 1.0, 1.0])
b = b / np.linalg.norm(b)

# Classical solution.
x_classical = np.linalg.solve(A, b)
x_classical = x_classical / np.linalg.norm(x_classical)
print(f"Classical solution (normalized): {x_classical}")

# In a real HHL implementation, the matrix A would be block-encoded.
# Then QSVT applies a polynomial approximation to 1/x on the singular values of A.
# Here we directly compute the polynomial form for illustration.

# Simplified: condition number of A.
eigenvalues = np.linalg.eigvalsh(A)
kappa = max(eigenvalues) / min(eigenvalues)
print(f"Condition number: {kappa:.2f}")
print(f"Polynomial degree needed for 1/x approx: ~{int(np.log(kappa))}")
print(f"HHL gate count: O(log({A.shape[0]}) * kappa * log(1/epsilon)) = O({int(np.log2(A.shape[0]) * kappa * 5)})")

The HHL workflow in production:

  1. Block-encode AA (tutorial 29).
  2. Construct a polynomial approximation to 1/x1/x over the singular values of AA.
  3. Apply the polynomial via QSVT (tutorial 30).
  4. The result is the amplitude-encoded inverse-applied state.

The QSVT-based HHL is dramatically faster than the original phase-estimation HHL — the constant factors and asymptotic scaling are both better.

Common misconceptions

“HHL gives exponential speedup for any linear system.” False. The speedup requires sparse + well-conditioned + quantum-input + quantum-output regimes simultaneously. Generic dense large-κ\kappa problems are NOT exponentially sped up.

“HHL is dead because of dequantization.” Partially. Specific applications that motivated HHL (Kerenidis-Prakash recommendations, generic ML applications) are dequantizable. Other applications (Hamiltonian-grounded chemistry, quantum-input regression) survive.

“HHL is a black-box quantum primitive.” It’s not — implementing HHL requires real engineering decisions about input encoding, polynomial approximation, condition-number budget, etc. The QSVT-based variant has different engineering challenges than the original phase-estimation one.

“HHL is competing with classical solvers in 2026.” It’s not yet — small-scale HHL demos exist, but HHL on real-world ill-conditioned matrices is far from beating tuned classical iterative solvers in production. The competition is on specific structured problems where the structural advantage is provable.

Decision rule

Use HHL or HHL-derived algorithms when:

  1. The input has structure that classical SQ doesn’t capture. Quantum-native input or specifically structured sparse problems.
  2. The output is consumed as a quantum state. No need to read out classical components.
  3. The condition number is modest. κ\kappa in the hundreds, not millions.
  4. You’re implementing a fault-tolerant subroutine. HHL is a structural primitive in qubitization-based algorithms.

Don’t use HHL when:

  1. You need classical solution components. Output bottleneck eats the speedup.
  2. The matrix is large and dense without QRAM. Input loading dominates.
  3. You’re solving a problem where Tang-style dequantization applies. Run a classical algorithm with comparable parameters first.

For most production linear-algebra workflows in 2026, HHL is not the right tool. For structural quantum-algorithm research and for fault-tolerant primitives, HHL-derived techniques are central.

Exercises

1. Why output-readout dominates

You compute x=A1b|x\rangle = A^{-1} |b\rangle via HHL in O(logN)O(\log N) time. To read out all NN components classically, what is the total runtime?

Show answer

Reading out one component requires measurement, costing 1/N1/\sqrt{N} precision per measurement. To get ϵ\epsilon-precision on each component requires 1/ϵ21/\epsilon^2 measurements. To get NN components: N×1/ϵ2N \times 1/\epsilon^2 measurements. The full classical readout takes O(N/ϵ2)O(N / \epsilon^2) measurements, which is much worse than the O(logN)O(\log N) HHL runtime — and worse than a classical algorithm to begin with. HHL’s speedup only applies to applications where you don’t need full classical readout. This is one of the most important misunderstandings about HHL.

2. Condition number and runtime

A real-world linear system has N=109N = 10^9 and κ=104\kappa = 10^4. Compare HHL’s runtime against classical conjugate gradient.

Show answer

HHL: O(logNκ)=O(30104)=3×105O(\log N \cdot \kappa) = O(30 \cdot 10^4) = 3 \times 10^5. Classical CG: O(Nsκ)O(N \cdot s \cdot \sqrt{\kappa}) for sparse case ≈ O(10910100)=1012O(10^9 \cdot 10 \cdot 100) = 10^{12} for s=10s = 10. HHL is 7 orders of magnitude faster. For sparse well-conditioned systems, HHL’s speedup is real even at modest κ\kappa. The catch is: this is quantum runtime, requiring the quantum machine to exist. With current 2026 hardware (no fault-tolerant quantum computer), running HHL on this system size is impossible. Classical CG runs now.

3. The dequantization match

A 2024 paper claims HHL achieves O(logN)O(\log N) for a structured matrix-inversion problem. Tang-style dequantization gives O(poly(logN))O(\text{poly}(\log N)) for the same problem with classical SQ access. Is the quantum speedup meaningful?

Show answer

If both are polylogarithmic in NN but the quantum is O(logN)O(\log N) and classical is O((logN)k)O((\log N)^k) for some k>1k > 1, the speedup is polynomial in logN\log N — not exponential, not nothing. Useful but not transformative. The honest description would be “polynomial speedup over classical SQ” rather than “exponential speedup over classical.” Many post-2018 HHL claims have been recharacterized this way as the dequantization implications were absorbed.

4. HHL as a subroutine

A larger algorithm uses HHL as a subroutine, computing A1bA^{-1} |b\rangle to use in a downstream quantum step. The downstream step is itself BQP-complete. Does HHL retain its speedup in this context?

Show answer

Yes. The downstream step is a quantum operation with no classical SQ analog (BQP-complete subroutines are not dequantizable). The HHL output is consumed quantumly without classical readout. The full algorithm enjoys the quantum speedup, with HHL contributing logarithmic-in-NN matrix-inversion subroutines. This is the structurally cleanest application of HHL: as part of a fault-tolerant pipeline that uses quantum states throughout. HHL as a quantum-pipeline primitive survives dequantization fully. The speedup in pure quantum algorithms is intact; it’s only the classical-input/classical-output framing that is dequantizable.

Where this goes next

Tutorial 59 covers LCU (Linear Combination of Unitaries) — the framework that powers HHL, qubitization, and modern Hamiltonian simulation. Tutorial 60 covers quantum phase estimation precision analysis, the technical workhorse of HHL and many related algorithms.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.