Block Encoding: How Modern Quantum Algorithms Get Arbitrary Matrices Into Quantum Circuits
Block encoding is the most important quantum-algorithm primitive of the past decade. It is the technique that lets you embed an arbitrary matrix A into a unitary U so a quantum computer can act with A on a state — making Hamiltonian simulation, linear-system solvers, and the entire QSVT framework possible. This tutorial defines block encoding precisely, gives the key constructions (LCU, qubitization, sparse-matrix), and shows why this is the abstraction that ate quantum algorithms after 2015.
Prerequisites: Tutorial 11: QFT and Phase Estimation
If you learned quantum algorithms from a 2015 textbook, the canonical examples were Shor’s, Grover’s, and quantum phase estimation — algorithms designed around specific problem structures, each with its own ad-hoc trick. If you learn quantum algorithms in 2026, you instead start with a single primitive — block encoding — and almost every modern algorithm becomes a special case of how you process the block-encoded matrix afterward.
This shift is the single biggest change in the field’s algorithmic landscape since Shor. It is the reason Hamiltonian simulation got polynomial-time-optimal in the dependence on simulation accuracy, the reason linear-system solvers (HHL and successors) became practical to discuss as production tools rather than complexity-theory toys, and the reason quantum singular value transformation (QSVT, tutorial 30) is now the standard framing for quantum algorithm design.
This tutorial is the conceptual prerequisite for all of that. It defines block encoding precisely, gives the three standard constructions, and shows why the abstraction works.
The problem block encoding solves
Quantum computers natively act with unitary operators. The state-vector evolution is for some unitary . Many problems we want to solve, however, are most naturally phrased in terms of non-unitary matrices:
- A Hamiltonian is Hermitian but not generally unitary. Hamiltonian simulation needs , which is unitary, but the underlying computational object is itself.
- A linear system has a non-unitary . To prepare a state proportional to , you need to act with , which is also non-unitary.
- A polynomial of a matrix is generically non-unitary even if is unitary.
- A least-squares solution uses the pseudoinverse , also non-unitary.
The classical workaround on a quantum computer is to embed the non-unitary matrix into a larger unitary — using extra ancilla qubits to host the rest of the unitary structure — and arrange things so that the action of the matrix on the data appears as one block of the larger unitary. The data qubits experience the matrix; the ancilla qubits absorb the unitary completion. This embedding is block encoding.
The definition
Let be an matrix with . A unitary acting on qubits is an -block encoding of if
Three parameters are doing the work in this definition.
- is the subnormalization factor. Block encoding represents inside the unitary; you have to multiply by to recover the actual matrix. The subnormalization must satisfy so that , which is necessary for to fit inside a unitary at all.
- is the ancilla count. The block encoding uses extra qubits beyond the data qubits.
- is the encoding error. A perfect block encoding has ; in practice we tolerate some approximation error.
The structural picture: write as an -qubit unitary. Project onto on the ancilla register and read off the upper-left block of . That block is approximately .
In matrix form:
where the entries fill out the unitary completion and are not required to have any particular structure. As long as the upper-left block matches to error , the unitary is a valid block encoding.
Using a block encoding
Suppose is an -block encoding of , and you want to apply (up to subnormalization) to a state . The protocol:
- Prepare .
- Apply .
- Measure the ancilla register. Postselect on the all-zero outcome.
If you observe on the ancilla, the data register is now in the state — exactly acting on , normalized. The probability of the desired ancilla outcome is , which is since .
Two practical observations:
- Subnormalization costs success probability. A larger means a smaller success probability per attempt. Aggressively reducing (making the block encoding “tight”) is one of the main optimization targets in block-encoding constructions.
- Amplitude amplification fixes most of the success-probability problem. Tutorial 10 showed that Grover-style amplitude amplification turns a probability- event into a probability- event in rounds. So a block encoding with and amplitude amplification gives an effective overhead — modest, polynomial, and tunable.
Construction 1: Linear combination of unitaries (LCU)
The most general block-encoding construction is linear combination of unitaries (LCU), introduced by Childs and Wiegert in 2012 in the context of Hamiltonian simulation but quickly recognized as a universal block-encoding technique.
Suppose you can write as
where each is unitary and is a real coefficient (positive WLOG). Define .
The LCU block encoding uses two ancilla registers: a -qubit “select” register and the data register. Two unitaries do the work:
- PREP prepares the state on the select register.
- SEL is the controlled unitary — applies to the data when the select register is in state .
The full circuit is an -block encoding of . The subnormalization is ; the ancilla cost is logarithmic in the number of terms.
This construction is universal in the sense that any matrix with a known decomposition into unitaries (e.g., a Pauli decomposition: where each is a Pauli string) admits an LCU block encoding immediately. The cost depends on the decomposition: short decompositions (few terms, small ) give cheap block encodings.
Construction 2: Qubitization
For Hermitian , qubitization (Low and Chuang, 2017) gives a particularly elegant block encoding. The idea: rather than building from a sum, build it from a single oracle that “walks” on the eigenvalues of .
Given an oracle that prepares the state — a uniform superposition over eigenvectors of weighted by their eigenvalues — qubitization constructs a single unitary that acts as a 2D rotation on each eigenvalue-labeled subspace. Specifically, the eigenvalues of are for each eigenvalue of .
Why this matters: is a block encoding of in a stronger sense — it not only embeds but does so in a way that all eigenvalue-dependent subblocks are 2D rotations whose angles are simple functions of the eigenvalues. This structure is what enables QSVT (tutorial 30) to apply arbitrary polynomials to by composing rotations on these 2D subspaces.
Qubitization typically uses fewer ancillas than LCU — often just 1 — and achieves the optimal subnormalization for many matrix families. The catch: it requires the matrix to come with a specific oracle structure, which not all problems naturally provide.
Construction 3: Sparse-matrix block encoding
Many physically interesting matrices are sparse: the number of nonzero entries per row is , much less than the matrix dimension. A sparse-matrix block encoding takes specified by two oracles:
- : given where is a row and is a column-index counter, return the column index of the -th nonzero entry in row .
- : given , return the value (or its compressed-precision representation).
The block encoding combines and via an LCU-like trick to produce a unitary that block-encodes , where is the largest matrix entry by absolute value. Subnormalization: . Ancilla cost: . Query cost to the oracles: per block-encoding application.
This construction is the workhorse for Hamiltonian simulation of physically motivated systems (lattice models, molecular Hamiltonians) where sparsity is natural and the oracles are easy to construct from the problem data. Most of the published Hamiltonian-simulation algorithms in 2026 use a sparse-matrix block encoding under the hood.
A small Python example
Here is the Pauli-decomposition LCU construction made concrete. We block-encode a Hermitian matrix as a sum of Pauli strings, then verify the upper-left block of the resulting unitary equals .
import numpy as np
from itertools import product
# Pauli matrices.
I = np.eye(2, dtype=complex)
X = np.array([[0, 1], [1, 0]], dtype=complex)
Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
Z = np.array([[1, 0], [0, -1]], dtype=complex)
PAULIS = {"I": I, "X": X, "Y": Y, "Z": Z}
def pauli_decompose(A: np.ndarray) -> dict:
"""Decompose a 2^n x 2^n Hermitian matrix into Pauli strings."""
n = int(np.log2(A.shape[0]))
coeffs = {}
for labels in product("IXYZ", repeat=n):
P = PAULIS[labels[0]]
for label in labels[1:]:
P = np.kron(P, PAULIS[label])
coef = np.trace(P @ A) / (2**n)
if abs(coef) > 1e-10:
coeffs["".join(labels)] = coef.real
return coeffs
def lcu_block_encoding(coeffs: dict) -> np.ndarray:
"""Build the LCU block encoding for a Pauli-decomposed Hermitian matrix.
Returns the unitary U on (a + n) qubits, where 2^a >= number of Pauli terms.
"""
n = len(next(iter(coeffs))) # qubits in data register
L = len(coeffs) # number of Pauli terms
a = int(np.ceil(np.log2(max(L, 2)))) # ancilla qubits
L_padded = 2**a
alphas = np.array(list(coeffs.values()))
alphas_padded = np.zeros(L_padded)
alphas_padded[:L] = np.abs(alphas)
signs = np.sign(alphas)
alpha_sum = alphas_padded.sum()
# PREP: prepares sqrt(alpha_j / alpha_sum) |j>.
prep_amplitudes = np.sqrt(alphas_padded / alpha_sum)
prep = np.zeros((L_padded, L_padded), dtype=complex)
prep[:, 0] = prep_amplitudes
# Fill remaining columns to make prep unitary (Gram-Schmidt).
for col in range(1, L_padded):
v = np.zeros(L_padded); v[col] = 1
for prev in range(col):
v = v - (prep[:, prev].conj() @ v) * prep[:, prev]
nv = np.linalg.norm(v)
if nv > 1e-10:
prep[:, col] = v / nv
# SEL: block diagonal of signed Pauli strings.
pauli_strings = list(coeffs.keys())
sel = np.zeros((L_padded * 2**n, L_padded * 2**n), dtype=complex)
for j in range(L_padded):
if j < L:
P = PAULIS[pauli_strings[j][0]]
for label in pauli_strings[j][1:]:
P = np.kron(P, PAULIS[label])
sel[j*2**n:(j+1)*2**n, j*2**n:(j+1)*2**n] = signs[j] * P
else:
sel[j*2**n:(j+1)*2**n, j*2**n:(j+1)*2**n] = np.eye(2**n)
U = np.kron(prep.conj().T, np.eye(2**n)) @ sel @ np.kron(prep, np.eye(2**n))
return U, alpha_sum, a
# Test: block-encode a random 4x4 Hermitian matrix.
np.random.seed(0)
A = np.random.randn(4, 4) + 1j * np.random.randn(4, 4)
A = (A + A.conj().T) / 2
coeffs = pauli_decompose(A)
print(f"Pauli decomposition has {len(coeffs)} terms; alpha = {sum(abs(c) for c in coeffs.values()):.3f}")
U, alpha, a = lcu_block_encoding(coeffs)
print(f"Block encoding uses {a} ancilla qubits.")
print(f"Subnormalization alpha = {alpha:.3f}")
# Verify: upper-left 4x4 block of U equals A / alpha.
upper_left = U[:4, :4]
recovered_A = alpha * upper_left
print(f"Reconstruction error ||A - alpha * upper_left||: {np.linalg.norm(A - recovered_A):.2e}")
Sample output:
Pauli decomposition has 16 terms; alpha = 4.387
Block encoding uses 4 ancilla qubits.
Subnormalization alpha = 4.387
Reconstruction error ||A - alpha * upper_left||: 1.57e-15
The construction is exact up to floating-point precision. Four data dimensions plus four ancilla qubits gives a 256-dimensional unitary whose upper-left block is .
This is the LCU recipe in 50 lines of Python. Production block-encoding implementations in Qiskit or Cirq are more sophisticated (they avoid materializing the full unitary, they use efficient Pauli SELECT circuits, they exploit sparsity), but the structural picture is identical.
Cost summary
The key cost parameters for any block encoding:
| Parameter | What it measures | Typical scaling |
|---|---|---|
| (subnormalization) | Amplitude-amplification overhead | for LCU; for qubitization |
| (ancilla count) | Qubit overhead | for LCU; for qubitization on 1-sparse oracles |
| query complexity | Cost per use | to oracles; constructions vary |
| (encoding error) | Accuracy | Tunable; usually is propagated through the rest of the algorithm |
The art of designing a quantum algorithm using block encoding is choosing the construction whose cost parameters match the structure of the problem. Pauli-decomposition LCU is great when the Pauli decomposition is short. Qubitization is great when the matrix has natural eigenvalue access. Sparse-matrix block encoding is great when the matrix is sparse and has constructive oracles.
Why this took over the field
Three reasons block encoding became the algorithmic abstraction of choice after 2015:
- Compositional algebra. Block encodings of and can be combined into a block encoding of , , , or any polynomial using QSVT. This means once you have block encoded the relevant matrices, you can reason about quantum algorithms purely in terms of matrix arithmetic, with no need to redo the circuit-level work.
- Optimal Hamiltonian simulation. Block-encoding-based Hamiltonian simulation (using qubitization or QSVT) achieves the optimal scaling — strictly better than any product-formula approach on most physical Hamiltonians. This is the reason every modern Hamiltonian-simulation algorithm uses a block encoding under the hood (tutorial 31).
- Algorithmic uniformity. A single primitive (block encoding) plus a single transformation (QSVT) generates Hamiltonian simulation, linear-system solvers, eigenvalue estimation, ground-state preparation, and many more algorithms. The pre-2015 quantum-algorithm zoo was a collection of disparate tricks; the post-2015 zoo is largely a list of block-encoding recipes plus polynomial choices in QSVT.
This is why a 2026 quantum-algorithms class spends weeks on block encoding before getting to specific algorithms. The primitive is the algorithm.
Common misconceptions
“Block encoding is just LCU.” LCU is one construction. Qubitization, sparse-matrix oracles, and direct unitary completions are also standard. The umbrella concept is broader; LCU is the easiest entry point.
“Subnormalization is a fundamental quantum-physics overhead.” No. The subnormalization is a consequence of how we choose to embed a non-unitary into a unitary ; tighter encodings have smaller , better encodings (qubitization) achieve which is the minimum allowed. Subnormalization is a design choice, not a physical law.
“Block encoding is a complexity-theory abstraction with no practical impact.” False. Modern Hamiltonian-simulation libraries (in Qiskit, Cirq, PennyLane) use block-encoding-based methods as their default. Quantum machine-learning algorithms post-2018 use block encodings for matrix arithmetic. Most published quantum-algorithm research papers since 2018 cite block encoding centrally. It is the working primitive of the current field.
“Block encoding requires fault tolerance to be useful.” No. Many small-scale block-encoding demonstrations have been published on NISQ hardware. The full power of block encoding (long algorithms, deep QSVT polynomials) does need fault tolerance, but the primitive itself is hardware-agnostic.
Decision rule
When you read a quantum-algorithm paper or proposal, ask:
- Is the algorithm framed in terms of block encoding? If yes, the analysis is probably modern and clean. If no, it is either a pre-2015 algorithm being re-presented, or one of the small set of algorithms (Grover, QFT) that does not naturally fit the framework.
- What is the subnormalization ? This is the dominant cost factor. A tight is the difference between practical and impractical.
- Which construction (LCU, qubitization, sparse oracle)? This determines the ancilla overhead and the access model. The construction has to match the available hardware capabilities.
- What polynomial is being applied via QSVT? If the algorithm is post-2017, it almost certainly is a QSVT polynomial. The polynomial choice is the algorithm’s “secret sauce.” Ask what it is and why.
Understanding these four questions is the difference between reading modern quantum-algorithm papers fluently and reading them as a sequence of tricks.
Exercises
1. Subnormalization minimum
Show that any block encoding of a matrix must have (operator norm). Why is this a fundamental lower bound?
Show answer
The block encoding embeds as an upper-left block of a unitary . Any block of a unitary has operator norm at most 1: . Therefore , i.e., . The minimum is achieved by qubitization for many matrix families. LCU achieves which is in general strictly larger than the operator norm — the LCU subnormalization is tight on some matrices but loose on others.
2. Pauli decomposition cost
Decompose the 2-qubit Hamiltonian into Pauli strings. What is the LCU subnormalization ? What is the operator norm ? Are they equal?
Show answer
The Pauli decomposition is given: 3 terms, each with coefficient 1. So . The operator norm: is the well-known SWAP-related matrix; its eigenvalues are (it’s ), so . In this case LCU is tight — . The terms align constructively. For more complicated Hamiltonians, LCU is generically loose.
3. When LCU is loose
Construct a Hermitian matrix where the LCU subnormalization exceeds by a factor of 2. Hint: think about cancellations in a Pauli decomposition.
Show answer
Take . Trivially zero with LCU coefficients , so but . The ratio is infinite. For a non-trivial example: , which has , but if you naively decompose as with multiple cancelling terms, you can get as large as you like. The general lesson: LCU is loose precisely when the Pauli decomposition has cancellation between same-sign and opposite-sign terms. Qubitization or a smarter LCU regrouping fixes this.
4. Ancilla budget
You have a Hamiltonian on 50 qubits with 1,000 Pauli terms. Estimate the ancilla qubits needed for an LCU block encoding. What about a sparse-matrix block encoding if the Hamiltonian is 4-sparse?
Show answer
LCU: ancillas, plus the 50 data qubits = 60 qubits total. Sparse-matrix: ancilla qubits. Sparse-matrix is somewhat cheaper in ancillas, but its subnormalization is , which can be larger than the LCU subnormalization for matrices where the Pauli coefficients are well-distributed. The “right” choice depends on the structure of the Hamiltonian — most modern Hamiltonian-simulation papers benchmark both.
Where this goes next
Tutorial 30 takes block encoding and applies it via quantum singular value transformation (QSVT) — the technique that turns “I have a block encoding of ” into “I can compute for any low-degree polynomial in essentially the same cost as computing one block-encoding round.” This is the abstraction that ties Hamiltonian simulation, linear-system solving, and eigenvalue estimation into a single framework.