Quantum Outpost
algorithms advanced · 14 min read ·

LCU: The Linear Combination of Unitaries Framework

Linear combination of unitaries (LCU) is a quantum-algorithm framework that lets you implement non-unitary linear operations on a quantum state, with cost proportional to the number of unitary terms. Together with block encoding (tutorial 29) and qubitization (tutorial 30), LCU is the structural backbone of modern Hamiltonian simulation, HHL-style algorithms, and many other quantum-algorithm primitives. This tutorial covers the LCU lemma, the cost structure, and where LCU shows up in production algorithms.

Prerequisites: Tutorial 29: Block Encoding, Tutorial 31: Hamiltonian Simulation

A quantum algorithm is fundamentally a sequence of unitary operations. But many useful operations on quantum states are not unitary: matrix inversion (A1A^{-1}), polynomial transformations (p(A)p(A) for arbitrary pp), and exponentials of non-Hermitian operators (eAe^A for general AA). How do you implement non-unitary operations on a quantum computer that only does unitaries?

The Linear Combination of Unitaries (LCU) framework, introduced by Berry-Childs-Kothari 2014/2015, gives the answer: any linear operation A=iαiUiA = \sum_i \alpha_i U_i that decomposes as a weighted sum of unitaries can be implemented on a quantum state, at a cost proportional to the number of unitary terms and the “norm” of the coefficient vector. This is the structural seed of modern quantum algorithms — Hamiltonian simulation, HHL, qubitization, and QSVT all reduce to applying LCU primitives in clever ways.

This tutorial covers the LCU lemma, the cost structure, and shows where LCU appears in real algorithm constructions.

The lemma

LCU lemma. Let A=i=0m1αiUiA = \sum_{i=0}^{m-1} \alpha_i U_i where αi0\alpha_i \geq 0 and each UiU_i is unitary. Define α=iαi\alpha = \sum_i \alpha_i (the “1-norm” of the coefficient vector). Given:

  1. A PREPARE oracle: an efficient circuit that prepares the state α=iαi/αi|\sqrt{\alpha}\rangle = \sum_i \sqrt{\alpha_i / \alpha} \, |i\rangle on a register of log2m\lceil \log_2 m \rceil ancilla qubits.
  2. A SELECT oracle: a controlled operation that applies UiU_i to the data register when the ancilla register is i|i\rangle.

There is a quantum circuit that, applied to 0aψ|0_a\rangle |\psi\rangle (ancilla register in zero state, data in state ψ|\psi\rangle), produces the state

1α0aAψ  +  Φ\frac{1}{\sqrt{\alpha}}\,|0_a\rangle \otimes A |\psi\rangle \;+\; |\Phi_\perp\rangle

where Φ|\Phi_\perp\rangle is some state orthogonal to 0a|0_a\rangle. Measuring the ancilla register: outcome 0a|0_a\rangle has probability proportional to Aψ2/α2\|A |\psi\rangle\|^2 / \alpha^2, and conditional on this outcome the data register is in the desired state Aψ/AψA |\psi\rangle / \|A |\psi\rangle\|.

The “non-unitary” AA is implemented as a probabilistic quantum operation that succeeds with probability 1/α2\sim 1/\alpha^2 per attempt. With amplitude amplification, the expected number of repetitions is O(α)O(\alpha), giving total LCU cost O(α)O(\alpha) uses of PREPARE + SELECT.

The two oracles

The PREPARE oracle is essentially a state-preparation circuit. For a coefficient vector with mm entries, PREPARE requires log2m\lceil \log_2 m \rceil ancilla qubits and O(logm)O(\log m) gates (for structured coefficient distributions; more for arbitrary).

The SELECT oracle is a controlled-application of one of the mm unitaries based on the ancilla. SELECT cost depends on the structure of the unitaries:

  • For Pauli strings: SELECT costs O(logm)O(\log m) depth using standard multi-controlled gates.
  • For generic unitaries: SELECT costs O(mcU)O(m \cdot c_U) where cUc_U is the cost of one unitary. This is the dominant cost.

In practice, LCU implementations are tightly bound to the structure of the unitaries — Pauli strings get cheap SELECT, generic unitaries pay polynomially.

How LCU connects to block encoding

Tutorial 29 introduced block encoding: a unitary UU on a larger Hilbert space whose top-left block equals a given matrix AA (up to normalization). LCU is exactly a block encoding for A=iαiUiA = \sum_i \alpha_i U_i:

  • The PREPARE + SELECT + PREPARE^-1 circuit is a block encoding of A/αA / \alpha.
  • The block-encoding subnormalization is α\alpha.

So LCU is one of the simplest constructive recipes for block-encoding a matrix. If you can write AA as a sum of unitaries with cheap PREPARE and SELECT, you have a block encoding for free.

This is why modern quantum-algorithm papers use LCU notation: it’s a constructive way to talk about block-encoding via the explicit unitary decomposition. The block-encoding then plugs into qubitization (tutorial 30) for polynomial transformations or QSVT for arbitrary functions.

Hamiltonian simulation via LCU

The original application: simulate eiHte^{-i H t} where H=iαiPiH = \sum_i \alpha_i P_i (a sum of Pauli terms, the standard Hamiltonian decomposition for chemistry molecules).

Naive Trotter approach: eiHt(ieiαiPiδt)t/δte^{-i H t} \approx (\prod_i e^{-i \alpha_i P_i \delta t})^{t/\delta t}, with error O(t2/Nsteps)O(t^2 / N_\text{steps}). To achieve ϵ\epsilon-error: Nsteps=O(t2/ϵ)N_\text{steps} = O(t^2 / \epsilon). Total gates: O(NPaulit2/ϵ)O(N_\text{Pauli} \cdot t^2 / \epsilon).

LCU approach (Berry-Childs-Kothari 2014): decompose eiHte^{-i H t} as a Taylor series:

eiHt    k=0K(iHt)kk!.e^{-i H t} \;\approx\; \sum_{k=0}^{K} \frac{(-i H t)^k}{k!}.

This is itself a sum of unitaries (each HkH^k expands to a sum of Pauli products). Apply LCU. The cost: O(αtlog(1/ϵ)/loglog(1/ϵ))O(\alpha t \log(1/\epsilon) / \log\log(1/\epsilon)) — exponentially better log(1/ϵ)\log(1/\epsilon) scaling than Trotter’s polynomial scaling.

LCU-based Hamiltonian simulation is exponentially faster than Trotter for high-precision regimes. Modern qubitization-based simulation (tutorial 30) is even better, but it builds directly on the LCU framework.

A small LCU demonstration

Concrete code applying LCU to a sum of two unitaries:

import numpy as np
import pennylane as qml

n_data_qubits = 2
n_ancilla = 1  # for 2 unitaries
total_qubits = n_data_qubits + n_ancilla

dev = qml.device("default.qubit", wires=total_qubits)


def prepare_oracle(coeffs):
    """Prepare |sqrt(alpha)> = sum_i sqrt(alpha_i / alpha) |i>."""
    # Two coefficients: just need to put ancilla in correct superposition.
    alpha = sum(coeffs)
    # cos(theta/2) ~ sqrt(alpha_0/alpha), sin(theta/2) ~ sqrt(alpha_1/alpha)
    theta = 2 * np.arccos(np.sqrt(coeffs[0] / alpha))
    qml.RY(theta, wires=n_data_qubits)


def select_oracle():
    """Apply U_0 on data when ancilla is |0>, U_1 when |1>."""
    # U_0 = X (X gate on first data qubit)
    qml.PauliX(wires=0).inv() if False else None  # placeholder for U_0
    # Controlled by ancilla, apply U_1 (Z on first data qubit)
    qml.ctrl(qml.PauliZ, control=n_data_qubits)(wires=0)
    # When ancilla is |0>, apply U_0 = identity (no operation).


@qml.qnode(dev)
def lcu_circuit(coeffs, input_state):
    """LCU for A = alpha_0 X + alpha_1 Z applied to data."""
    qml.AmplitudeEmbedding(input_state, wires=range(n_data_qubits), normalize=True)
    prepare_oracle(coeffs)
    select_oracle()
    qml.adjoint(prepare_oracle)(coeffs)
    return qml.state()


# Apply A = 0.5*X + 0.5*Z to a Bell-like state.
input_state = np.array([1, 0, 0, 1], dtype=complex) / np.sqrt(2)
coeffs = [0.5, 0.5]
output = lcu_circuit(coeffs, input_state)
print(f"LCU output (4-qubit state, ancilla in last position):")
print(output)

The output is a state where the first 2ndata2^{n_\text{data}} amplitudes (ancilla = 0|0\rangle branch) correspond to the desired AψA |\psi\rangle component. To read out, post-select on ancilla = 0 with success probability Aψ2/(αi)2\sim \|A|\psi\rangle\|^2 / (\sum |\alpha_i|)^2.

The PREPARE and SELECT oracle costs grow with the number of terms: 2 terms here is trivial; chemistry Hamiltonians with thousands of Pauli terms require larger PREPARE circuits.

Common misconceptions

“LCU lets you implement any non-unitary operation.” Only those expressible as linear combinations of efficiently-implementable unitaries. Arbitrary linear operators don’t decompose this way.

“LCU is one specific algorithm.” It’s a framework. Different choices of unitaries and PREPARE/SELECT implementations give different algorithms — Hamiltonian simulation, HHL, ground-state preparation, etc. all use LCU as the structural primitive.

“LCU success probability is high.” For non-trivial AA, the success probability is 1/α2\sim 1/\alpha^2. To get high success, amplitude amplify, but this costs O(α)O(\alpha) runtime. The “free” linearity comes at the cost of polynomial-in-α\alpha runtime.

“LCU coefficients can be negative.” The basic lemma assumes αi0\alpha_i \geq 0. Negative coefficients require sign-tracking (typically by absorbing into the unitaries or using a phase ancilla). The extension is straightforward; the cost is the same.

Decision rule

Use LCU thinking when:

  1. You need to apply a non-unitary linear operation to a quantum state. Matrix inversion, polynomial functions, exponentials of non-Hermitian operators.
  2. You can decompose the target as a sum of efficiently-implementable unitaries. Hamiltonians as sums of Paulis, low-rank matrices as sums of rank-1 unitaries, etc.
  3. You’re building a block encoding for a structured matrix. LCU gives constructive block encodings.

Don’t use LCU when:

  1. The unitary decomposition has too many terms. α\alpha scales linearly in the number of terms; for 10910^9-term decompositions LCU is too slow.
  2. You can implement the target operation directly as a unitary. No need for LCU.

For most modern quantum-algorithm research in 2026, LCU is the structural primitive that connects “specific algorithm” to “general framework.” Algorithms are designed by specifying the LCU decomposition that solves the problem.

Exercises

1. Cost scaling with alpha

A target operator A=i=11000UiA = \sum_{i=1}^{1000} U_i has all αi=1\alpha_i = 1 (uniform weights). Compute α\alpha and the LCU cost.

Show answer

α=iαi=1000\alpha = \sum_i \alpha_i = 1000. PREPARE state: α=i11000i|\sqrt{\alpha}\rangle = \sum_i \frac{1}{\sqrt{1000}} |i\rangle, a uniform superposition over 1000 indices. PREPARE cost: O(log21000)=10O(\log_2 1000) = 10 qubits and O(1)O(1) gates per Hadamard. SELECT cost: depends on the unitaries; for Pauli strings, O(logm)=10O(\log m) = 10 depth. Per-LCU-call cost: O(logm+cSELECT)O(\log m + c_\text{SELECT}). Number of attempts to succeed: O(α2)=106O(\alpha^2) = 10^6 if no amplitude amplification, O(α)=103O(\alpha) = 10^3 with amplification. Total cost: O(αcLCU)=O(10310)=104O(\alpha \cdot c_\text{LCU}) = O(10^3 \cdot 10) = 10^4 gate-equivalents. The α=1000\alpha = 1000 multiplier is the dominant factor.

2. PREPARE for chemistry Hamiltonians

A chemistry Hamiltonian for H2 has ~10 Pauli terms. For LiH it has ~250 terms. For BeH2 it has ~700 terms. How does the PREPARE oracle’s cost scale?

Show answer

PREPARE cost is O(logm)O(\log m) where mm is the number of terms — for state-preparation depth, not the actual gate count of building the right amplitude distribution. For arbitrary coefficient distributions, PREPARE needs O(m)O(m) gates in the worst case, but for structured distributions (e.g., Pauli weights of physical Hamiltonians), more efficient PREPARE is possible. For 700 terms (BeH2), PREPARE depth is ~log2(700)10\log_2(700) \approx 10, but gate count for arbitrary state preparation is O(700)O(700). This gate count is amortized across many LCU calls: the cost matters only at the level of total αNLCU\alpha \cdot N_\text{LCU} scaling, not per-step.

3. LCU vs Trotter for Hamiltonian simulation

For a Hamiltonian with α=100\alpha = 100 and target accuracy ϵ=106\epsilon = 10^{-6} over time t=1t = 1, compare LCU and Trotter gate counts.

Show answer

Trotter cost: O(αt)2/ϵ=O(104/106)=O(1010)O(\alpha t)^2 / \epsilon = O(10^4 / 10^{-6}) = O(10^{10}) gate-equivalents (assuming linear-in-precision scaling). LCU cost: O(αtlog(1/ϵ)/loglog(1/ϵ))=O(10010/5)=O(200)O(\alpha t \log(1/\epsilon) / \log\log(1/\epsilon)) = O(100 \cdot 10 / 5) = O(200) gate-equivalents. Eight orders of magnitude difference, in favor of LCU. This is why fault-tolerant chemistry resource estimates use LCU/qubitization-based simulation, not Trotter, despite Trotter being conceptually simpler. The LCU advantage compounds for high-precision tasks.

4. When LCU isn’t enough

A target operator has 101510^{15} Pauli terms (e.g., a quantum-chemistry FCI matrix in a large basis). LCU’s O(α)O(\alpha) scaling makes this infeasible. What alternative would you use?

Show answer

For too-large-to-decompose Hamiltonians, the alternative is sparse-access block encoding (Berry-Kieferova 2018, Childs-Kothari-Somma 2017): encode the Hamiltonian via a sparsity oracle that returns one nonzero matrix entry at a time, given a row index. The block encoding cost scales as O(s)O(\sqrt{s}) where ss is the maximum row sparsity, not O(α)O(\alpha) where α\alpha is the sum of coefficients. For sparse Hamiltonians, sparse-access is exponentially better than LCU. For dense Hamiltonians with too many terms, factored representations (low-rank approximations, tensor-network compression) are the practical workarounds. LCU is the right tool when α\alpha is moderate; for huge α\alpha, switch to sparse access.

Where this goes next

Tutorial 60 covers the precision analysis of quantum phase estimation — the technical workhorse that LCU and HHL use under the hood. Together, tutorials 57-60 cover the framework primitives (walks, HHL, LCU, phase estimation precision) that power modern quantum algorithms. The track is now substantially complete; future tutorials may dig into specific algorithm classes (quantum optimization beyond QAOA, adiabatic algorithms, post-Trotter simulation methods).


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.