quantum ml advanced · 16 min read · By LIPAI WANG · April 29, 2026

The Data-Loading Bottleneck: Why Quantum Machine Learning on Classical Data Rarely Delivers

The data-loading bottleneck is the structural reason most quantum machine learning on classical data does not deliver speedups. Loading $N$ classical bits into a quantum register typically costs $O(N)$ time — eating any algorithm's hoped-for $O(\log N)$ speedup. This tutorial covers the encoding options (amplitude, basis, angle), the QRAM construction and its open hardware question, and the regimes where the bottleneck is and is not binding.

Prerequisites: Tutorial 41: Tang Dequantization

Tutorial 41 explained why most QML on classical data gets dequantized — the input-access model that gives the quantum algorithm its speedup also gives a classical algorithm enough access to keep up. This tutorial looks at the same problem from the hardware angle: the data-loading bottleneck.

The bottleneck is structural. To run a quantum algorithm on classical data, you have to encode the data into a quantum state. The cheapest encoding for $N$ classical bits takes $O(N)$ time and $O(\log N)$ qubits, but the more useful encodings (amplitude encoding, the kind QML algorithms typically need) take $O(N)$ time and $O(N)$ qubits with naive constructions, or assume access to quantum random access memory (QRAM) — a piece of hardware that does not yet exist at any practical scale.

Either way, $O(N)$ time appears somewhere in the runtime. If your quantum algorithm promises an $O(\log N)$ speedup over a classical algorithm taking $O(N)$ , the data-loading step is right there in your runtime and the speedup is gone. The bottleneck is the structural reason why most “exponential” QML claims either implicitly assume QRAM or quietly absorb the loading cost.

This tutorial covers the encoding options, the QRAM construction and its open hardware question, the regimes where the bottleneck is and is not binding, and an honest decision rule for evaluating QML claims that touch classical data.

The three encodings

Loading $N$ classical real numbers $\{x_1, \ldots, x_N\}$ into a quantum register can be done several ways, with different costs and use cases.

Basis encoding

Encode each classical bit as one qubit in computational basis. $N$ bits → $N$ qubits, with $|x\rangle = |x_1 x_2 \ldots x_N\rangle$ .

Time: $O(N)$ — apply an X gate to each qubit that needs flipping.
Qubits: $O(N)$ .
Use cases: Classical preprocessing, oracle inputs, problems where the structure is inherently bit-level.

Basis encoding is the cheapest in terms of preparation time but the most expensive in qubit count. It is the natural input model for Grover-style search and oracle-based algorithms.

Amplitude encoding

Encode $N$ classical real numbers as the amplitudes of a superposition: $|x\rangle = \sum_{i=1}^N x_i |i\rangle / \|\boldsymbol{x}\|$ .

Time: $O(N)$ for naive direct construction, $O(\log N)$ if QRAM is available.
Qubits: $O(\log N)$ — exponentially fewer qubits than basis encoding.
Use cases: Most QML algorithms (HHL-style linear systems, quantum kernels, quantum PCA) assume amplitude encoding.

Amplitude encoding is the bottleneck encoding: it is qubit-efficient (logarithmic in data size) but time-prohibitive for naive direct construction. The whole question of whether QML delivers speedups depends on whether you have efficient amplitude encoding.

Angle encoding

Encode $N$ classical real numbers as rotation angles on $N$ qubits: $|x\rangle = \prod_{i=1}^N e^{-i x_i \sigma_y / 2} |0\rangle$ .

Time: $O(N)$ — apply each rotation in sequence.
Qubits: $O(N)$ .
Use cases: Variational QML, quantum kernels with simple feature maps.

Angle encoding is the practical compromise: easy to implement on hardware, no QRAM required, and natural fit for variational circuits. It is the dominant encoding in VQE-style and QML-classifier-style applications.

The structural fact: only amplitude encoding gives the qubit-efficient input model that exponential-speedup QML claims rely on. Angle and basis encodings are honest about their $O(N)$ cost.

The QRAM construction

Quantum random access memory (QRAM) is a hardware construct that, given $N$ classical data values stored at $N$ classical addresses, supports the quantum query

|\text{address}\rangle |\text{junk}\rangle \;\to\; |\text{address}\rangle |x_\text{address}\rangle.

Crucially, QRAM is supposed to support this in superposition: a query $\sum_a \alpha_a |a\rangle |\text{junk}\rangle$ becomes $\sum_a \alpha_a |a\rangle |x_a\rangle$ . With QRAM, amplitude-encoded loading takes $O(\log N)$ time — fast enough to preserve the exponential speedup of an $O(\log N)$ quantum algorithm.

The standard QRAM construction is bucket-brigade QRAM (Giovannetti, Lloyd, Maccone 2008): a tree of quantum routers, $N - 1$ routers in total, that route the address bits to the corresponding stored value. The query takes $O(\log N)$ depth.

The open question: can bucket-brigade QRAM be built at scale on real hardware?

The state of the art as of 2026:

No large-scale QRAM has been built. The largest demonstrated QRAMs are at single-digit-bit scale, in laboratory experiments.
The fault-tolerance overhead is high. Even with active error correction, each router gate is a quantum operation that must be done fault-tolerantly, and there are $O(N)$ such routers in the QRAM. Resource estimates suggest QRAM is more expensive than the algorithms it serves, in practice.
No clear hardware architecture. Bucket-brigade QRAM was proposed for nondemolition optical memories or NV centers, but neither has been demonstrated at scale.
Some QRAM-free alternatives exist. Block encoding (tutorial 30) and certain quantum-data-loading techniques can avoid explicit QRAM in some algorithms.

The 2026 picture: QRAM is theoretically necessary for many QML speedups but not practically available. Most “QML with exponential speedup” claims are quietly assuming a QRAM that does not exist at scale.

Why this matters

Combine the three observations:

Most QML algorithms assume amplitude encoding for their input.
Amplitude encoding without QRAM takes $O(N)$ time (naive construction).
QRAM does not exist at scale.

Therefore: most QML algorithms, run end-to-end on real hardware, are bottlenecked by the input loading step. A quantum algorithm that takes $O(\log N)$ on the loaded state but spends $O(N)$ loading the state from classical data has effective runtime $O(N)$ — the same as classical algorithms.

This is the data-loading bottleneck, and it is the structural reason why claimed exponential QML speedups have not materialized in practice. It is also the deeper reason behind Tang’s dequantization argument: the input model that would give the exponential speedup (QRAM access) is also the input model that gives classical algorithms the SQ access they need to keep up.

The bottleneck is not absolute. There are workarounds, and there are regimes where it does not bind.

When the bottleneck does not bind

Three regimes where the data-loading bottleneck is not the dominant cost:

1. Quantum-native data

If your “data” is the output of a quantum simulation, a quantum sensor, or a previous quantum computation, there is no classical-to-quantum loading step. The data starts in quantum form. Tutorial 42 covered QCNNs as a clean example: phase classification operates directly on quantum states from a simulator, not on classical descriptions.

This is the strongest escape from the bottleneck. Most genuinely quantum-advantage QML in 2026 lives in this regime.

2. Sparse data

If the classical data is sparse — most entries are zero or have a known structure — efficient amplitude encoding is sometimes possible without QRAM. Specific algorithms for sparse-vector loading have $O(\text{sparsity}) \log N$ runtime instead of $O(N)$ .

This is a partial escape. Real data is often sparse, but how sparse depends on the application. Recommendation systems and certain physical Hamiltonians are sparse enough for these methods to help.

3. Restart-amortized data

If you are going to query the same data many times across different algorithm runs (e.g., a quantum machine-learning model that classifies many test inputs against the same training set), the QRAM-loading cost is amortized over the queries.

In this regime, even an $O(N)$ one-time loading is acceptable if subsequent operations are $O(\text{poly}(\log N))$ each. Practically, this is the regime that makes QRAM-style architectures defensible — not because each query is fast, but because the per-query cost amortizes.

The bottleneck binds most acutely for single-query problems where you load data, run one quantum operation, and discard the result. Most production ML inference is single-query at the per-input level, even if the model itself is reused, so the amortization story has to be carefully matched to the deployment pattern.

Practical encoding choices in 2026

What real systems actually do:

Variational QML: angle encoding. Natural fit for parameterized circuits, no QRAM needed, $O(N)$ per circuit run is acceptable when the algorithm is itself NISQ-scale.
Quantum kernels: angle encoding with custom feature maps. Tutorial 16 covered this.
VQE and chemistry: Hartree-Fock state preparation followed by ansatz. Inputs are problem parameters (Hamiltonian coefficients), not bulk data.
HHL and linear systems: require amplitude encoding. In production, this means classical preprocessing to a structured form (sparse, low-rank) where efficient encodings exist.
QML on quantum-native data: no encoding needed. The state is already quantum.

The pattern: production QML carefully avoids the configurations where the data-loading bottleneck would bind. Either the data is quantum-native, or the encoding is angle-style with $O(N)$ cost absorbed into the algorithm runtime, or the bulk data has special structure (sparse, low-rank) that allows efficient loading.

A small encoding-cost calculator

Here is a Python sketch that computes the loading-vs-algorithm time tradeoff for a generic QML algorithm:

import math

def qml_runtime_with_loading(
    n_data: int,
    encoding: str,
    has_qram: bool,
    n_queries: int = 1,
    quantum_runtime_log_factor: float = 1.0,
) -> dict:
    """
    Compute total runtime for a QML algorithm including data-loading.
    n_data: number of classical data points
    encoding: 'amplitude', 'angle', or 'basis'
    has_qram: whether QRAM is available
    n_queries: how many algorithm runs share the same data (amortization)
    quantum_runtime_log_factor: the prefactor on log(n_data) for the quantum part
    """
    # Per-loading cost.
    if encoding == 'amplitude' and has_qram:
        loading_per_call = math.log2(n_data)
    elif encoding == 'amplitude':
        loading_per_call = n_data
    elif encoding == 'angle':
        loading_per_call = n_data
    elif encoding == 'basis':
        loading_per_call = n_data
    else:
        raise ValueError(encoding)

    # Quantum algorithm runtime per call.
    quantum_per_call = quantum_runtime_log_factor * math.log2(n_data)

    # Amortized total.
    total_quantum = quantum_per_call * n_queries
    total_loading = loading_per_call if has_qram else loading_per_call * n_queries
    grand_total = total_loading + total_quantum

    classical_baseline = n_data * n_queries  # classical algorithm scaling.

    return {
        "loading_total": total_loading,
        "quantum_runtime_total": total_quantum,
        "grand_total_quantum": grand_total,
        "classical_baseline": classical_baseline,
        "speedup_factor": classical_baseline / grand_total,
    }


# Example: 1 million data points, single-query QML, amplitude encoding without QRAM.
r = qml_runtime_with_loading(n_data=10**6, encoding='amplitude', has_qram=False, n_queries=1)
print(f"Single-query, no QRAM: speedup = {r['speedup_factor']:.2f}x")
# Output: speedup = 1.00 (loading dominates; no actual speedup)

# Same with QRAM (hypothetical):
r = qml_runtime_with_loading(n_data=10**6, encoding='amplitude', has_qram=True, n_queries=1)
print(f"Single-query, with QRAM: speedup = {r['speedup_factor']:.2f}x")
# Output: speedup ~ 50,000 (QRAM gives the claimed exponential speedup)

# 1 million queries amortized:
r = qml_runtime_with_loading(n_data=10**6, encoding='amplitude', has_qram=False, n_queries=10**6)
print(f"Amortized over 1M queries: speedup = {r['speedup_factor']:.2f}x")
# Output: speedup ~ 50,000 (amortization recovers the speedup even without QRAM)

The pattern is clear: without QRAM, the speedup exists only when amortized over many queries. With QRAM, the speedup is real for single-query workloads. Most public QML claims assume one of these two regimes; checking which one is the right diagnostic question for any specific claim.

Common misconceptions

“The data-loading bottleneck is just an engineering issue.” It’s a structural one. Even with arbitrarily good engineering, loading $N$ classical numbers into amplitude encoding takes time proportional to $N$ unless you have QRAM-grade quantum data structures. The bottleneck is fundamental to the cost model, not to current technology.

“QRAM will be solved soon.” No serious roadmap claims to ship a useful QRAM in the next 5 years. The fault-tolerance overhead alone makes QRAM more expensive than the algorithms it serves, by current estimates. QRAM-free alternatives (block encoding, structured-data preparation) are the more likely practical path.

“Variational QML doesn’t have a data-loading bottleneck.” Variational QML uses angle encoding, which is $O(N)$ . The bottleneck is technically there but absorbed into the per-circuit runtime. Whether this is “fine” depends on whether the algorithm gives a structural advantage worth the loading cost — usually it does not for classical-data inputs (tutorial 41).

“Block encoding sidesteps QRAM.” Block encoding (tutorial 30) provides efficient access to operator entries but does not eliminate the need for input-data loading. It helps when the data has structure that makes block encoding cheap; it does not help for generic dense data.

“The dequantization wave killed QML, so the data-loading bottleneck is moot.” They are two angles on the same problem. Dequantization shows that if you have efficient classical access to the data (SQ oracle), classical algorithms can match many quantum claims. The data-loading bottleneck shows that if you have inefficient quantum access (no QRAM), quantum algorithms can’t deliver their claims either. Both observations point to the same structural conclusion: classical-data QML does not give exponential speedups in practical settings.

Decision rule

When you read a QML algorithm claim:

What encoding is assumed? Amplitude → check QRAM assumption. Angle/basis → expect $O(N)$ in the runtime.
Is QRAM assumed and at what scale? If yes → flag as “depends on QRAM availability.” No production QRAM exists at scale; the claim is conditional on future hardware.
Is the data quantum-native? If yes → no loading cost; the structural advantage may survive.
Is the workload amortized over many queries? If yes → the loading cost amortizes; the speedup may be real for that workload.
Has the paper benchmarked end-to-end (including loading) against classical algorithms? Most do not. Demand this for any practical-runtime claim.

A QML algorithm that survives all five questions has a credible runtime story. Most do not — they assume QRAM, assume single-query workloads, and skip the end-to-end benchmark. The data-loading bottleneck is the diagnostic that most often catches incomplete claims.

Exercises

1. Basis vs amplitude encoding

You want to encode a 1024-bit string. Using basis encoding requires 1024 qubits and 1024 X-gate operations. Using amplitude encoding (in principle) requires 10 qubits and an unknown number of operations. What is the minimum number of operations for amplitude encoding without QRAM, and why?

Show answer

Without QRAM, amplitude encoding of an arbitrary $N$ -amplitude state takes $\Omega(N)$ operations because each amplitude is a continuous parameter that must be specified, and quantum gates can only specify one parameter per gate. Naive amplitude encoding constructions take exactly $O(N)$ gates, with $O(\log N)$ qubits. Amplitude encoding saves qubits but not gates. You trade a 100× qubit saving for the same gate count as basis encoding. This is why amplitude encoding is preferred in algorithms that exploit the qubit savings (logarithmic-time downstream operations) but not in algorithms where total gate count is the limit. The true win of amplitude encoding requires QRAM — exponentially faster gate complexity, not just qubit complexity.

2. When sparse data helps

You have a sparse vector with $K$ non-zero entries among $N$ total positions. Compare the cost of basis encoding vs amplitude encoding (with sparse-aware methods).

Show answer

Basis encoding of a sparse vector still costs $O(N)$ qubits because every position needs a qubit. Sparse-aware amplitude encoding uses ancilla-qubit tricks (rotation+CNOT trees) to encode only the $K$ non-zero amplitudes. Cost: $O(K \log(N/K))$ gates and $O(\log N)$ qubits. For very sparse data ( $K \ll N$ ), this can be a real win — encoding $K = 100$ non-zero entries among $N = 10^6$ positions takes $\sim 1{,}500$ gates instead of $10^6$ . Sparse-aware methods are one of the practical escape valves for the data-loading bottleneck.

3. Why QRAM doesn’t fit on a 1000-qubit machine

A 1000-qubit machine has, in principle, enough qubits for $\log_2(N)$ index qubits to address $N \sim 10^{300}$ data points. Why does it not actually have QRAM for that many data points?

Show answer

Bucket-brigade QRAM requires $O(N)$ quantum routers, each one a small circuit element. To address $N = 10^{300}$ data points, you’d need $10^{300}$ routers — vastly more qubits than the addressing bits suggest. The address space is logarithmic in $N$ ; the QRAM hardware is linear. The qubits available in a small machine are nowhere near enough to construct meaningful QRAM. This is why even modest QRAM ( $N = 10^6$ ) requires millions of routers, far beyond current hardware. The qubit count needed for QRAM scales like the data size, not like the index size, which is the engineering problem QRAM has not solved.

4. The amortization tradeoff

A QML model is trained once on a dataset of $N = 10^9$ classical data points and then queried a million times for inference. With angle encoding, what fraction of total runtime is data-loading?

Show answer

Per-query runtime: $O(N) = 10^9$ for loading + $O(\log N) \approx 30$ for the quantum part. Total per query: $\sim 10^9$ — loading dominates by 8 orders of magnitude. Total over 1M queries: $\sim 10^{15}$ operations, all loading. Loading is essentially 100% of the runtime in any scenario where you have to reload the data per query. The only way to recover usable runtime is to not reload — store the loaded state in QRAM (not available) or train+infer once per loaded state (not the standard ML pattern). This is why classical-data QML inference is structurally slow even when the algorithm itself is fast.

Where this goes next

Tutorial 44 covers quantum generative models — born machines and quantum GANs. They share QML’s structural challenges but have a cleaner argument for at least expressivity-class quantum advantage that doesn’t depend on classical-to-quantum data loading.

The three encodings

Basis encoding

Amplitude encoding

Angle encoding

The QRAM construction

Why this matters

When the bottleneck does not bind

1. Quantum-native data

2. Sparse data

3. Restart-amortized data

Practical encoding choices in 2026

A small encoding-cost calculator

Common misconceptions

Decision rule

Exercises

1. Basis vs amplitude encoding

2. When sparse data helps

3. Why QRAM doesn’t fit on a 1000-qubit machine

4. The amortization tradeoff

Where this goes next

Quantum, for people who already code.