algorithms advanced · 14 min read · By LIPAI WANG · May 1, 2026

LCU: The Linear Combination of Unitaries Framework

Linear combination of unitaries (LCU) is a quantum-algorithm framework that lets you implement non-unitary linear operations on a quantum state, with cost proportional to the number of unitary terms. Together with block encoding (tutorial 29) and qubitization (tutorial 30), LCU is the structural backbone of modern Hamiltonian simulation, HHL-style algorithms, and many other quantum-algorithm primitives. This tutorial covers the LCU lemma, the cost structure, and where LCU shows up in production algorithms.

Prerequisites: Tutorial 29: Block Encoding, Tutorial 31: Hamiltonian Simulation

A quantum algorithm is fundamentally a sequence of unitary operations. But many useful operations on quantum states are not unitary: matrix inversion ( $A^{-1}$ ), polynomial transformations ( $p(A)$ for arbitrary $p$ ), and exponentials of non-Hermitian operators ( $e^A$ for general $A$ ). How do you implement non-unitary operations on a quantum computer that only does unitaries?

The Linear Combination of Unitaries (LCU) framework, introduced by Berry-Childs-Kothari 2014/2015, gives the answer: any linear operation $A = \sum_i \alpha_i U_i$ that decomposes as a weighted sum of unitaries can be implemented on a quantum state, at a cost proportional to the number of unitary terms and the “norm” of the coefficient vector. This is the structural seed of modern quantum algorithms — Hamiltonian simulation, HHL, qubitization, and QSVT all reduce to applying LCU primitives in clever ways.

This tutorial covers the LCU lemma, the cost structure, and shows where LCU appears in real algorithm constructions.

The lemma

LCU lemma. Let $A = \sum_{i=0}^{m-1} \alpha_i U_i$ where $\alpha_i \geq 0$ and each $U_i$ is unitary. Define $\alpha = \sum_i \alpha_i$ (the “1-norm” of the coefficient vector). Given:

A PREPARE oracle: an efficient circuit that prepares the state $|\sqrt{\alpha}\rangle = \sum_i \sqrt{\alpha_i / \alpha} \, |i\rangle$ on a register of $\lceil \log_2 m \rceil$ ancilla qubits.
A SELECT oracle: a controlled operation that applies $U_i$ to the data register when the ancilla register is $|i\rangle$ .

There is a quantum circuit that, applied to $|0_a\rangle |\psi\rangle$ (ancilla register in zero state, data in state $|\psi\rangle$ ), produces the state

\frac{1}{\sqrt{\alpha}}\,|0_a\rangle \otimes A |\psi\rangle \;+\; |\Phi_\perp\rangle

The “non-unitary” $A$ is implemented as a probabilistic quantum operation that succeeds with probability $\sim 1/\alpha^2$ per attempt. With amplitude amplification, the expected number of repetitions is $O(\alpha)$ , giving total LCU cost $O(\alpha)$ uses of PREPARE + SELECT.

The two oracles

The PREPARE oracle is essentially a state-preparation circuit. For a coefficient vector with $m$ entries, PREPARE requires $\lceil \log_2 m \rceil$ ancilla qubits and $O(\log m)$ gates (for structured coefficient distributions; more for arbitrary).

The SELECT oracle is a controlled-application of one of the $m$ unitaries based on the ancilla. SELECT cost depends on the structure of the unitaries:

For Pauli strings: SELECT costs $O(\log m)$ depth using standard multi-controlled gates.
For generic unitaries: SELECT costs $O(m \cdot c_U)$ where $c_U$ is the cost of one unitary. This is the dominant cost.

In practice, LCU implementations are tightly bound to the structure of the unitaries — Pauli strings get cheap SELECT, generic unitaries pay polynomially.

How LCU connects to block encoding

Tutorial 29 introduced block encoding: a unitary $U$ on a larger Hilbert space whose top-left block equals a given matrix $A$ (up to normalization). LCU is exactly a block encoding for $A = \sum_i \alpha_i U_i$ :

The PREPARE + SELECT + PREPARE^-1 circuit is a block encoding of $A / \alpha$ .
The block-encoding subnormalization is $\alpha$ .

So LCU is one of the simplest constructive recipes for block-encoding a matrix. If you can write $A$ as a sum of unitaries with cheap PREPARE and SELECT, you have a block encoding for free.

This is why modern quantum-algorithm papers use LCU notation: it’s a constructive way to talk about block-encoding via the explicit unitary decomposition. The block-encoding then plugs into qubitization (tutorial 30) for polynomial transformations or QSVT for arbitrary functions.

Hamiltonian simulation via LCU

The original application: simulate $e^{-i H t}$ where $H = \sum_i \alpha_i P_i$ (a sum of Pauli terms, the standard Hamiltonian decomposition for chemistry molecules).

Naive Trotter approach: $e^{-i H t} \approx (\prod_i e^{-i \alpha_i P_i \delta t})^{t/\delta t}$ , with error $O(t^2 / N_\text{steps})$ . To achieve $\epsilon$ -error: $N_\text{steps} = O(t^2 / \epsilon)$ . Total gates: $O(N_\text{Pauli} \cdot t^2 / \epsilon)$ .

LCU approach (Berry-Childs-Kothari 2014): decompose $e^{-i H t}$ as a Taylor series:

e^{-i H t} \;\approx\; \sum_{k=0}^{K} \frac{(-i H t)^k}{k!}.

This is itself a sum of unitaries (each $H^k$ expands to a sum of Pauli products). Apply LCU. The cost: $O(\alpha t \log(1/\epsilon) / \log\log(1/\epsilon))$ — exponentially better $\log(1/\epsilon)$ scaling than Trotter’s polynomial scaling.

LCU-based Hamiltonian simulation is exponentially faster than Trotter for high-precision regimes. Modern qubitization-based simulation (tutorial 30) is even better, but it builds directly on the LCU framework.

A small LCU demonstration

Concrete code applying LCU to a sum of two unitaries:

import numpy as np
import pennylane as qml

n_data_qubits = 2
n_ancilla = 1  # for 2 unitaries
total_qubits = n_data_qubits + n_ancilla

dev = qml.device("default.qubit", wires=total_qubits)


def prepare_oracle(coeffs):
    """Prepare |sqrt(alpha)> = sum_i sqrt(alpha_i / alpha) |i>."""
    # Two coefficients: just need to put ancilla in correct superposition.
    alpha = sum(coeffs)
    # cos(theta/2) ~ sqrt(alpha_0/alpha), sin(theta/2) ~ sqrt(alpha_1/alpha)
    theta = 2 * np.arccos(np.sqrt(coeffs[0] / alpha))
    qml.RY(theta, wires=n_data_qubits)


def select_oracle():
    """Apply U_0 on data when ancilla is |0>, U_1 when |1>."""
    # U_0 = X (X gate on first data qubit)
    qml.PauliX(wires=0).inv() if False else None  # placeholder for U_0
    # Controlled by ancilla, apply U_1 (Z on first data qubit)
    qml.ctrl(qml.PauliZ, control=n_data_qubits)(wires=0)
    # When ancilla is |0>, apply U_0 = identity (no operation).


@qml.qnode(dev)
def lcu_circuit(coeffs, input_state):
    """LCU for A = alpha_0 X + alpha_1 Z applied to data."""
    qml.AmplitudeEmbedding(input_state, wires=range(n_data_qubits), normalize=True)
    prepare_oracle(coeffs)
    select_oracle()
    qml.adjoint(prepare_oracle)(coeffs)
    return qml.state()


# Apply A = 0.5*X + 0.5*Z to a Bell-like state.
input_state = np.array([1, 0, 0, 1], dtype=complex) / np.sqrt(2)
coeffs = [0.5, 0.5]
output = lcu_circuit(coeffs, input_state)
print(f"LCU output (4-qubit state, ancilla in last position):")
print(output)

The output is a state where the first $2^{n_\text{data}}$ amplitudes (ancilla = $|0\rangle$ branch) correspond to the desired $A |\psi\rangle$ component. To read out, post-select on ancilla = 0 with success probability $\sim \|A|\psi\rangle\|^2 / (\sum |\alpha_i|)^2$ .

The PREPARE and SELECT oracle costs grow with the number of terms: 2 terms here is trivial; chemistry Hamiltonians with thousands of Pauli terms require larger PREPARE circuits.

Common misconceptions

“LCU lets you implement any non-unitary operation.” Only those expressible as linear combinations of efficiently-implementable unitaries. Arbitrary linear operators don’t decompose this way.

“LCU is one specific algorithm.” It’s a framework. Different choices of unitaries and PREPARE/SELECT implementations give different algorithms — Hamiltonian simulation, HHL, ground-state preparation, etc. all use LCU as the structural primitive.

“LCU success probability is high.” For non-trivial $A$ , the success probability is $\sim 1/\alpha^2$ . To get high success, amplitude amplify, but this costs $O(\alpha)$ runtime. The “free” linearity comes at the cost of polynomial-in- $\alpha$ runtime.

“LCU coefficients can be negative.” The basic lemma assumes $\alpha_i \geq 0$ . Negative coefficients require sign-tracking (typically by absorbing into the unitaries or using a phase ancilla). The extension is straightforward; the cost is the same.

Decision rule

Use LCU thinking when:

You need to apply a non-unitary linear operation to a quantum state. Matrix inversion, polynomial functions, exponentials of non-Hermitian operators.
You can decompose the target as a sum of efficiently-implementable unitaries. Hamiltonians as sums of Paulis, low-rank matrices as sums of rank-1 unitaries, etc.
You’re building a block encoding for a structured matrix. LCU gives constructive block encodings.

Don’t use LCU when:

The unitary decomposition has too many terms. $\alpha$ scales linearly in the number of terms; for $10^9$ -term decompositions LCU is too slow.
You can implement the target operation directly as a unitary. No need for LCU.

For most modern quantum-algorithm research in 2026, LCU is the structural primitive that connects “specific algorithm” to “general framework.” Algorithms are designed by specifying the LCU decomposition that solves the problem.

Exercises

1. Cost scaling with alpha

A target operator $A = \sum_{i=1}^{1000} U_i$ has all $\alpha_i = 1$ (uniform weights). Compute $\alpha$ and the LCU cost.

Show answer

$\alpha = \sum_i \alpha_i = 1000$ . PREPARE state: $|\sqrt{\alpha}\rangle = \sum_i \frac{1}{\sqrt{1000}} |i\rangle$ , a uniform superposition over 1000 indices. PREPARE cost: $O(\log_2 1000) = 10$ qubits and $O(1)$ gates per Hadamard. SELECT cost: depends on the unitaries; for Pauli strings, $O(\log m) = 10$ depth. Per-LCU-call cost: $O(\log m + c_\text{SELECT})$ . Number of attempts to succeed: $O(\alpha^2) = 10^6$ if no amplitude amplification, $O(\alpha) = 10^3$ with amplification. Total cost: $O(\alpha \cdot c_\text{LCU}) = O(10^3 \cdot 10) = 10^4$ gate-equivalents. The $\alpha = 1000$ multiplier is the dominant factor.

2. PREPARE for chemistry Hamiltonians

A chemistry Hamiltonian for H2 has ~10 Pauli terms. For LiH it has ~250 terms. For BeH2 it has ~700 terms. How does the PREPARE oracle’s cost scale?

Show answer

PREPARE cost is $O(\log m)$ where $m$ is the number of terms — for state-preparation depth, not the actual gate count of building the right amplitude distribution. For arbitrary coefficient distributions, PREPARE needs $O(m)$ gates in the worst case, but for structured distributions (e.g., Pauli weights of physical Hamiltonians), more efficient PREPARE is possible. For 700 terms (BeH2), PREPARE depth is ~ $\log_2(700) \approx 10$ , but gate count for arbitrary state preparation is $O(700)$ . This gate count is amortized across many LCU calls: the cost matters only at the level of total $\alpha \cdot N_\text{LCU}$ scaling, not per-step.

3. LCU vs Trotter for Hamiltonian simulation

For a Hamiltonian with $\alpha = 100$ and target accuracy $\epsilon = 10^{-6}$ over time $t = 1$ , compare LCU and Trotter gate counts.

Show answer

Trotter cost: $O(\alpha t)^2 / \epsilon = O(10^4 / 10^{-6}) = O(10^{10})$ gate-equivalents (assuming linear-in-precision scaling). LCU cost: $O(\alpha t \log(1/\epsilon) / \log\log(1/\epsilon)) = O(100 \cdot 10 / 5) = O(200)$ gate-equivalents. Eight orders of magnitude difference, in favor of LCU. This is why fault-tolerant chemistry resource estimates use LCU/qubitization-based simulation, not Trotter, despite Trotter being conceptually simpler. The LCU advantage compounds for high-precision tasks.

4. When LCU isn’t enough

A target operator has $10^{15}$ Pauli terms (e.g., a quantum-chemistry FCI matrix in a large basis). LCU’s $O(\alpha)$ scaling makes this infeasible. What alternative would you use?

Show answer

For too-large-to-decompose Hamiltonians, the alternative is sparse-access block encoding (Berry-Kieferova 2018, Childs-Kothari-Somma 2017): encode the Hamiltonian via a sparsity oracle that returns one nonzero matrix entry at a time, given a row index. The block encoding cost scales as $O(\sqrt{s})$ where $s$ is the maximum row sparsity, not $O(\alpha)$ where $\alpha$ is the sum of coefficients. For sparse Hamiltonians, sparse-access is exponentially better than LCU. For dense Hamiltonians with too many terms, factored representations (low-rank approximations, tensor-network compression) are the practical workarounds. LCU is the right tool when $\alpha$ is moderate; for huge $\alpha$ , switch to sparse access.

Where this goes next

Tutorial 60 covers the precision analysis of quantum phase estimation — the technical workhorse that LCU and HHL use under the hood. Together, tutorials 57-60 cover the framework primitives (walks, HHL, LCU, phase estimation precision) that power modern quantum algorithms. The track is now substantially complete; future tutorials may dig into specific algorithm classes (quantum optimization beyond QAOA, adiabatic algorithms, post-Trotter simulation methods).

The lemma

The two oracles

How LCU connects to block encoding

Hamiltonian simulation via LCU

A small LCU demonstration

Common misconceptions

Decision rule

Exercises

1. Cost scaling with alpha

2. PREPARE for chemistry Hamiltonians

3. LCU vs Trotter for Hamiltonian simulation

4. When LCU isn’t enough

Where this goes next

Quantum, for people who already code.