LCU: The Linear Combination of Unitaries Framework
Linear combination of unitaries (LCU) is a quantum-algorithm framework that lets you implement non-unitary linear operations on a quantum state, with cost proportional to the number of unitary terms. Together with block encoding (tutorial 29) and qubitization (tutorial 30), LCU is the structural backbone of modern Hamiltonian simulation, HHL-style algorithms, and many other quantum-algorithm primitives. This tutorial covers the LCU lemma, the cost structure, and where LCU shows up in production algorithms.
Prerequisites: Tutorial 29: Block Encoding, Tutorial 31: Hamiltonian Simulation
A quantum algorithm is fundamentally a sequence of unitary operations. But many useful operations on quantum states are not unitary: matrix inversion (), polynomial transformations ( for arbitrary ), and exponentials of non-Hermitian operators ( for general ). How do you implement non-unitary operations on a quantum computer that only does unitaries?
The Linear Combination of Unitaries (LCU) framework, introduced by Berry-Childs-Kothari 2014/2015, gives the answer: any linear operation that decomposes as a weighted sum of unitaries can be implemented on a quantum state, at a cost proportional to the number of unitary terms and the “norm” of the coefficient vector. This is the structural seed of modern quantum algorithms — Hamiltonian simulation, HHL, qubitization, and QSVT all reduce to applying LCU primitives in clever ways.
This tutorial covers the LCU lemma, the cost structure, and shows where LCU appears in real algorithm constructions.
The lemma
LCU lemma. Let where and each is unitary. Define (the “1-norm” of the coefficient vector). Given:
- A PREPARE oracle: an efficient circuit that prepares the state on a register of ancilla qubits.
- A SELECT oracle: a controlled operation that applies to the data register when the ancilla register is .
There is a quantum circuit that, applied to (ancilla register in zero state, data in state ), produces the state
where is some state orthogonal to . Measuring the ancilla register: outcome has probability proportional to , and conditional on this outcome the data register is in the desired state .
The “non-unitary” is implemented as a probabilistic quantum operation that succeeds with probability per attempt. With amplitude amplification, the expected number of repetitions is , giving total LCU cost uses of PREPARE + SELECT.
The two oracles
The PREPARE oracle is essentially a state-preparation circuit. For a coefficient vector with entries, PREPARE requires ancilla qubits and gates (for structured coefficient distributions; more for arbitrary).
The SELECT oracle is a controlled-application of one of the unitaries based on the ancilla. SELECT cost depends on the structure of the unitaries:
- For Pauli strings: SELECT costs depth using standard multi-controlled gates.
- For generic unitaries: SELECT costs where is the cost of one unitary. This is the dominant cost.
In practice, LCU implementations are tightly bound to the structure of the unitaries — Pauli strings get cheap SELECT, generic unitaries pay polynomially.
How LCU connects to block encoding
Tutorial 29 introduced block encoding: a unitary on a larger Hilbert space whose top-left block equals a given matrix (up to normalization). LCU is exactly a block encoding for :
- The PREPARE + SELECT + PREPARE^-1 circuit is a block encoding of .
- The block-encoding subnormalization is .
So LCU is one of the simplest constructive recipes for block-encoding a matrix. If you can write as a sum of unitaries with cheap PREPARE and SELECT, you have a block encoding for free.
This is why modern quantum-algorithm papers use LCU notation: it’s a constructive way to talk about block-encoding via the explicit unitary decomposition. The block-encoding then plugs into qubitization (tutorial 30) for polynomial transformations or QSVT for arbitrary functions.
Hamiltonian simulation via LCU
The original application: simulate where (a sum of Pauli terms, the standard Hamiltonian decomposition for chemistry molecules).
Naive Trotter approach: , with error . To achieve -error: . Total gates: .
LCU approach (Berry-Childs-Kothari 2014): decompose as a Taylor series:
This is itself a sum of unitaries (each expands to a sum of Pauli products). Apply LCU. The cost: — exponentially better scaling than Trotter’s polynomial scaling.
LCU-based Hamiltonian simulation is exponentially faster than Trotter for high-precision regimes. Modern qubitization-based simulation (tutorial 30) is even better, but it builds directly on the LCU framework.
A small LCU demonstration
Concrete code applying LCU to a sum of two unitaries:
import numpy as np
import pennylane as qml
n_data_qubits = 2
n_ancilla = 1 # for 2 unitaries
total_qubits = n_data_qubits + n_ancilla
dev = qml.device("default.qubit", wires=total_qubits)
def prepare_oracle(coeffs):
"""Prepare |sqrt(alpha)> = sum_i sqrt(alpha_i / alpha) |i>."""
# Two coefficients: just need to put ancilla in correct superposition.
alpha = sum(coeffs)
# cos(theta/2) ~ sqrt(alpha_0/alpha), sin(theta/2) ~ sqrt(alpha_1/alpha)
theta = 2 * np.arccos(np.sqrt(coeffs[0] / alpha))
qml.RY(theta, wires=n_data_qubits)
def select_oracle():
"""Apply U_0 on data when ancilla is |0>, U_1 when |1>."""
# U_0 = X (X gate on first data qubit)
qml.PauliX(wires=0).inv() if False else None # placeholder for U_0
# Controlled by ancilla, apply U_1 (Z on first data qubit)
qml.ctrl(qml.PauliZ, control=n_data_qubits)(wires=0)
# When ancilla is |0>, apply U_0 = identity (no operation).
@qml.qnode(dev)
def lcu_circuit(coeffs, input_state):
"""LCU for A = alpha_0 X + alpha_1 Z applied to data."""
qml.AmplitudeEmbedding(input_state, wires=range(n_data_qubits), normalize=True)
prepare_oracle(coeffs)
select_oracle()
qml.adjoint(prepare_oracle)(coeffs)
return qml.state()
# Apply A = 0.5*X + 0.5*Z to a Bell-like state.
input_state = np.array([1, 0, 0, 1], dtype=complex) / np.sqrt(2)
coeffs = [0.5, 0.5]
output = lcu_circuit(coeffs, input_state)
print(f"LCU output (4-qubit state, ancilla in last position):")
print(output)
The output is a state where the first amplitudes (ancilla = branch) correspond to the desired component. To read out, post-select on ancilla = 0 with success probability .
The PREPARE and SELECT oracle costs grow with the number of terms: 2 terms here is trivial; chemistry Hamiltonians with thousands of Pauli terms require larger PREPARE circuits.
Common misconceptions
“LCU lets you implement any non-unitary operation.” Only those expressible as linear combinations of efficiently-implementable unitaries. Arbitrary linear operators don’t decompose this way.
“LCU is one specific algorithm.” It’s a framework. Different choices of unitaries and PREPARE/SELECT implementations give different algorithms — Hamiltonian simulation, HHL, ground-state preparation, etc. all use LCU as the structural primitive.
“LCU success probability is high.” For non-trivial , the success probability is . To get high success, amplitude amplify, but this costs runtime. The “free” linearity comes at the cost of polynomial-in- runtime.
“LCU coefficients can be negative.” The basic lemma assumes . Negative coefficients require sign-tracking (typically by absorbing into the unitaries or using a phase ancilla). The extension is straightforward; the cost is the same.
Decision rule
Use LCU thinking when:
- You need to apply a non-unitary linear operation to a quantum state. Matrix inversion, polynomial functions, exponentials of non-Hermitian operators.
- You can decompose the target as a sum of efficiently-implementable unitaries. Hamiltonians as sums of Paulis, low-rank matrices as sums of rank-1 unitaries, etc.
- You’re building a block encoding for a structured matrix. LCU gives constructive block encodings.
Don’t use LCU when:
- The unitary decomposition has too many terms. scales linearly in the number of terms; for -term decompositions LCU is too slow.
- You can implement the target operation directly as a unitary. No need for LCU.
For most modern quantum-algorithm research in 2026, LCU is the structural primitive that connects “specific algorithm” to “general framework.” Algorithms are designed by specifying the LCU decomposition that solves the problem.
Exercises
1. Cost scaling with alpha
A target operator has all (uniform weights). Compute and the LCU cost.
Show answer
. PREPARE state: , a uniform superposition over 1000 indices. PREPARE cost: qubits and gates per Hadamard. SELECT cost: depends on the unitaries; for Pauli strings, depth. Per-LCU-call cost: . Number of attempts to succeed: if no amplitude amplification, with amplification. Total cost: gate-equivalents. The multiplier is the dominant factor.
2. PREPARE for chemistry Hamiltonians
A chemistry Hamiltonian for H2 has ~10 Pauli terms. For LiH it has ~250 terms. For BeH2 it has ~700 terms. How does the PREPARE oracle’s cost scale?
Show answer
PREPARE cost is where is the number of terms — for state-preparation depth, not the actual gate count of building the right amplitude distribution. For arbitrary coefficient distributions, PREPARE needs gates in the worst case, but for structured distributions (e.g., Pauli weights of physical Hamiltonians), more efficient PREPARE is possible. For 700 terms (BeH2), PREPARE depth is ~, but gate count for arbitrary state preparation is . This gate count is amortized across many LCU calls: the cost matters only at the level of total scaling, not per-step.
3. LCU vs Trotter for Hamiltonian simulation
For a Hamiltonian with and target accuracy over time , compare LCU and Trotter gate counts.
Show answer
Trotter cost: gate-equivalents (assuming linear-in-precision scaling). LCU cost: gate-equivalents. Eight orders of magnitude difference, in favor of LCU. This is why fault-tolerant chemistry resource estimates use LCU/qubitization-based simulation, not Trotter, despite Trotter being conceptually simpler. The LCU advantage compounds for high-precision tasks.
4. When LCU isn’t enough
A target operator has Pauli terms (e.g., a quantum-chemistry FCI matrix in a large basis). LCU’s scaling makes this infeasible. What alternative would you use?
Show answer
For too-large-to-decompose Hamiltonians, the alternative is sparse-access block encoding (Berry-Kieferova 2018, Childs-Kothari-Somma 2017): encode the Hamiltonian via a sparsity oracle that returns one nonzero matrix entry at a time, given a row index. The block encoding cost scales as where is the maximum row sparsity, not where is the sum of coefficients. For sparse Hamiltonians, sparse-access is exponentially better than LCU. For dense Hamiltonians with too many terms, factored representations (low-rank approximations, tensor-network compression) are the practical workarounds. LCU is the right tool when is moderate; for huge , switch to sparse access.
Where this goes next
Tutorial 60 covers the precision analysis of quantum phase estimation — the technical workhorse that LCU and HHL use under the hood. Together, tutorials 57-60 cover the framework primitives (walks, HHL, LCU, phase estimation precision) that power modern quantum algorithms. The track is now substantially complete; future tutorials may dig into specific algorithm classes (quantum optimization beyond QAOA, adiabatic algorithms, post-Trotter simulation methods).