quantum ml advanced · 16 min read · April 29, 2026

Quantum Generative Models: Born Machines, Quantum GANs, and the Sampling-Class Advantage

Quantum generative models — born machines, quantum generative adversarial networks (QGANs) — sample from probability distributions defined by quantum states. Unlike classification or regression QML, generative QML has a clean structural quantum-advantage argument: there exist distributions that quantum circuits can efficiently sample from but classical algorithms cannot. This tutorial covers born machines, QGANs, the sampling-class complexity argument, and the regimes where quantum generative models can plausibly beat classical alternatives.

Prerequisites: Tutorial 41: Tang Dequantization, Tutorial 42: Quantum Convolutional Neural Networks

Tutorial 41 and 43 painted a pessimistic picture of quantum machine learning on classical data. Tutorial 42 covered QCNNs as a structural escape route via quantum-data inputs. This tutorial covers a different escape: quantum generative models. Their advantage rests not on input-data structure but on sampling-class complexity — the well-established result that there exist probability distributions that quantum circuits can sample from in polynomial time but classical algorithms cannot, under standard complexity assumptions.

This is the same complexity-theoretic foundation that underlies quantum-supremacy demonstrations (Google’s 2019 random circuit sampling, USTC’s Jiuzhang Gaussian boson sampling). Sampling tasks are easier to argue about than classification or regression because the relevant computational object is a probability distribution, not a function — and we have stronger lower bounds on classical sampling than we do on classical function evaluation.

Quantum generative models put this complexity argument to work. Born machines sample from distributions defined by quantum-state amplitudes; quantum GANs train a born-machine generator against a discriminator; specialized variants exploit specific complexity-classes (IQP-class, BosonSampling-class) for which classical sampling is known to be hard.

This tutorial covers the architecture, the sampling-class advantage argument, the GAN training story, and a decision rule for when quantum generative models make sense versus when classical generative models suffice.

The Born rule as a probabilistic model

Quantum mechanics’ Born rule says that measuring a state $|\psi\rangle = \sum_x \alpha_x |x\rangle$ in the computational basis gives outcome $x$ with probability $|\alpha_x|^2$ . This is the definition of a probability distribution from a quantum state.

A born machine is a parameterized quantum circuit $U(\boldsymbol{\theta})$ acting on $|0\rangle^{\otimes n}$ to produce a state $|\psi(\boldsymbol{\theta})\rangle$ . Measuring this state samples from the distribution

p_{\boldsymbol{\theta}}(x) \;=\; |\langle x | \psi(\boldsymbol{\theta}) \rangle|^2.

This is a fully-fledged probabilistic model: parameters $\boldsymbol{\theta}$ , distribution $p_{\boldsymbol{\theta}}$ , samples obtained by running the circuit and measuring. Trainable by adjusting $\boldsymbol{\theta}$ to minimize a divergence between $p_{\boldsymbol{\theta}}$ and some target distribution.

The unique property: the probability distribution is defined by quantum amplitudes. A classical generative model (e.g., a Boltzmann machine, a normalizing flow) defines its probability distribution by classical-tractable functions — energies, neural-network outputs, autoregressive products. A born machine defines it by quantum amplitudes that, in general, are not classically tractable.

This is the structural seed of the advantage argument: born-machine distributions, in some regimes, cannot be efficiently sampled from classically.

The sampling-class advantage

The sampling-class advantage is rooted in computer-science complexity theory. Three results together:

IQP circuits are classically hard to sample from (Bremner-Jozsa-Shepherd 2010). Instantaneous Quantum Polynomial-time (IQP) circuits — circuits made of commuting gates between layers of Hadamards — produce distributions that no efficient classical algorithm can sample from, under standard complexity assumptions (the polynomial hierarchy doesn’t collapse).
BosonSampling is classically hard (Aaronson-Arkhipov 2011). Sampling photons through a linear-optical network produces a distribution whose classical sampling is also believed hard. This was the basis of USTC’s Jiuzhang demonstrations.
Random circuit sampling is classically hard at large enough size (Boixo et al. 2018, formalized as the support of Google’s 2019 supremacy demonstration).

A born machine implementing any of these classes inherits the classical-sampling-hardness of its class. Born-machine generators, with appropriate parameterizations, can sample from distributions that classical generative models cannot match.

This is the structural argument for quantum generative QML. It does not depend on input-data access models (no QRAM needed; the model is generated, not loaded). It does not depend on barren-plateau-immune ansätze (born machines have their own training challenges, but the complexity argument is independent). And it does not depend on the dataset; the comparison is “what quantum circuits can sample” vs “what classical algorithms can sample,” which is a model-class comparison, not a data-access comparison.

Born machine training

Training a born machine: adjust $\boldsymbol{\theta}$ so $p_{\boldsymbol{\theta}}$ approximates a target distribution $p_\text{target}$ .

The standard objective: minimize the Maximum Mean Discrepancy (MMD) between samples from $p_{\boldsymbol{\theta}}$ and samples from $p_\text{target}$ . MMD is a kernel-based two-sample distance that is naturally suited to quantum samples — it is computable from samples alone, without explicit access to $p_{\boldsymbol{\theta}}$ or $p_\text{target}$ values.

The MMD loss:

L_{\text{MMD}}(\boldsymbol{\theta}) \;=\; \mathbb{E}_{x, x' \sim p_{\boldsymbol{\theta}}}[k(x, x')] - 2 \mathbb{E}_{x \sim p_{\boldsymbol{\theta}}, y \sim p_\text{target}}[k(x, y)] + \mathbb{E}_{y, y' \sim p_\text{target}}[k(y, y')],

where $k$ is a kernel (Gaussian, polynomial, etc.). The gradient of MMD with respect to $\boldsymbol{\theta}$ can be computed via parameter-shift, making the training implementable on quantum hardware.

Born machines have been demonstrated for:

Sampling from low-dimensional Gaussian mixtures (small-scale demos).
Sampling from molecular configuration distributions.
Modeling specific quantum-mechanical distributions where the target is itself a quantum state.

Limitations: born-machine training inherits some of variational QML’s training challenges. Barren plateaus can apply to deeply parameterized born machines. The MMD gradient has shot-noise issues at large qubit counts. Practical demonstrations have stayed in the 20-50-qubit regime.

Quantum generative adversarial networks

Quantum GANs (Lloyd-Weedbrook 2018, Dallaire-Demers-Killoran 2018) borrow the GAN training framework: a generator and discriminator competing.

The setup:

Generator: a born machine $G_{\boldsymbol{\theta}}$ that produces samples.
Discriminator: can be classical (a classical neural network that takes a sample and predicts “real or generated”) or quantum (a parameterized circuit that takes a sample state and outputs a probability).

Training: the discriminator is trained to distinguish generator samples from real-data samples; the generator is trained to fool the discriminator. The dance produces a generator whose distribution converges to the data distribution.

Three flavors:

QGAN with classical data and classical discriminator: the generator is quantum, but the data is classical. Mostly proof-of-concept; not the strong-advantage regime.
QGAN with quantum data and quantum discriminator: both halves are quantum. Used for tasks like learning to generate states from a quantum experiment. This is the cleanest structural-advantage regime.
QGAN as a state-preparation method: train the generator to produce a specific known target state (e.g., the ground state of a Hamiltonian). This is closer to VQE than to generative modeling per se.

QGAN training is empirically harder than classical GAN training due to the noise of quantum sampling and the smaller per-iteration data each gradient step provides. As of 2026, QGANs remain an active research area without a dominant commercial application.

The IQP born-machine specifically

Born machines built from IQP circuits are particularly interesting. The IQP architecture:

Apply Hadamard to all qubits.
Apply a layer of commuting diagonal gates parameterized by $\boldsymbol{\theta}$ .
Apply Hadamard to all qubits.
Measure.

The resulting distribution is provably classically hard to sample from (under standard complexity assumptions). An IQP born machine therefore samples from a distribution that no classical algorithm can match.

This is a strong structural property. It means an IQP born machine, properly trained, gives you genuine quantum advantage — not as a benchmark claim, but as a complexity-theoretic theorem.

The catch: training an IQP born machine to match an arbitrary target distribution is hard. The classical-hardness theorem says IQP samples are hard to imitate; it does not say it is easy to train an IQP machine to produce a desired target. In practice, IQP born machines are most useful when the target distribution is itself in the IQP-accessible class — i.e., when you are matching a complexity-class-restricted target.

A small born-machine example

Concrete code showing a tiny born machine training to match a 1-qubit target:

import numpy as np
import pennylane as qml
from pennylane import numpy as pnp

n_qubits = 3
dev = qml.device("default.qubit", wires=n_qubits, shots=1000)

@qml.qnode(dev)
def born_machine(weights):
    """Born machine sampling from |psi(weights)|^2."""
    for layer in range(3):
        for q in range(n_qubits):
            qml.RY(weights[layer, q, 0], wires=q)
            qml.RZ(weights[layer, q, 1], wires=q)
        for q in range(n_qubits - 1):
            qml.CNOT(wires=[q, q + 1])
    return qml.sample(wires=range(n_qubits))


def mmd_loss(weights, target_samples, kernel_sigma=1.0):
    """Maximum mean discrepancy between born-machine and target samples.
    Uses Gaussian kernel for simplicity.
    """
    bm_samples = born_machine(weights)
    bm_ints = np.array([int("".join(str(b) for b in s), 2) for s in bm_samples])
    tgt_ints = np.array([int("".join(str(b) for b in s), 2) for s in target_samples])

    def kernel(x, y):
        return np.exp(-((x - y)**2) / (2 * kernel_sigma**2))

    n = len(bm_ints)
    m = len(tgt_ints)
    bm_bm = np.mean([kernel(bm_ints[i], bm_ints[j]) for i in range(n) for j in range(n)])
    bm_tgt = np.mean([kernel(bm_ints[i], tgt_ints[j]) for i in range(n) for j in range(m)])
    tgt_tgt = np.mean([kernel(tgt_ints[i], tgt_ints[j]) for i in range(m) for j in range(m)])

    return bm_bm - 2 * bm_tgt + tgt_tgt


# Target: a uniform distribution over 8 states (a 3-bit integer).
np.random.seed(0)
target_samples = [tuple(np.random.randint(0, 2, n_qubits)) for _ in range(500)]

# Initial random weights.
weights = pnp.array(np.random.uniform(-0.3, 0.3, (3, n_qubits, 2)), requires_grad=True)

# Compute initial MMD loss.
print(f"Initial MMD: {mmd_loss(weights, target_samples):.4f}")

This is a stripped-down sketch. A real born-machine training run would use parameter-shift gradients on the MMD, optimize for hundreds of steps, and require careful hyperparameter tuning. The point is to show the structure: parameterized circuit + sample + MMD loss + gradient-based training.

Common misconceptions

“Born machines are just like classical generative models with extra steps.” No — they sample from quantum-amplitude-defined distributions that classical algorithms cannot match in the IQP-class regime. The structural difference is real, even if practical demonstrations are still small.

“Quantum supremacy demonstrations are quantum machine learning.” Different goals. Quantum supremacy demos sample from circuits designed to be hard for classical sampling, with no learning involved. Quantum generative models combine that hardness with training to match a target distribution — sampling-class advantage applied to a learning task.

“QGANs work like classical GANs with quantum hardware.” Architecturally similar but practically different. The training instability of GANs (mode collapse, balance issues between generator and discriminator) is amplified in quantum GANs by the additional shot noise of quantum sampling. QGANs are research-grade, not production-grade.

“Born machines solve the QML data-loading bottleneck.” They sidestep it for unconditional generation (sample without input data). For conditional generation (sample from $p(x | y)$ given input $y$ ), the data-loading bottleneck reappears for the conditioning data. The structural advantage is cleanest for unconditional generative tasks.

“All born-machine distributions are classically hard to sample.” False. Born machines built from Clifford-only circuits (tutorial 25) are classically simulable via Gottesman-Knill — their distributions are uniform over stabilizer states, with classically tractable sampling. The IQP-class or random-circuit-class subset of born machines is the structurally hard regime.

Decision rule

Use a quantum generative model when:

The target distribution is structured to favor quantum models. Distributions arising from quantum systems (chemistry, physics, lattice models) or from sampling-class problems (IQP-style, BosonSampling-style).
You need unconditional generation. Samples from a fixed-target distribution; no per-sample input data to load.
You can tolerate moderate sample counts and shot-noise-limited gradients. Born machine training is sample-hungry.
You are working at a research-grade scale. Production deployment of quantum generative models is not yet a thing.

Use a classical generative model when:

You have classical data inputs. The data-loading bottleneck makes quantum-conditional models unattractive for now.
You need state-of-the-art generative quality. Classical normalizing flows, diffusion models, and large language models far outperform any current quantum generative model on natural-image, text, or audio generation.
Production reliability matters. Classical generative models have years of engineering maturity; quantum ones do not.

The interesting frontier in 2026 is small-scale quantum generative models on quantum-physics-target distributions, where the structural advantage is clearest and the practical demonstrations have a chance of beating classical baselines. Larger-scale and more general-purpose quantum generative modeling is research-stage.

Exercises

1. Why classification QML and generative QML have different stories

Classification QML (tutorials 16, 17, 41) is dominated by classical baselines. Generative QML (this tutorial) has structural advantage arguments. Why are the stories different?

Show answer

Classification asks “given an input, predict an output” — both input and output are classical, the relevant complexity is around function evaluation, and the input-data-access model determines whether quantum methods can exploit any structural advantage. Generative tasks ask “produce a sample from a distribution” — there is no input data per sample (in unconditional generation), and the relevant complexity is sampling complexity, where quantum lower bounds are stronger and structural arguments are cleaner. The asymmetry between function-evaluation complexity (which is mostly classical-tractable for relevant ML problems) and sampling complexity (where quantum has provable separations) is the structural reason. Classification is a cousin of function evaluation; generation is a cousin of sampling.

2. The IQP advantage in practice

An IQP born machine is provably hard for classical algorithms to imitate. But IQP-distribution targets are unusual in real-world data. What is the practical implication?

Show answer

The IQP advantage is a complexity-theoretic statement about what classical algorithms cannot do. It does not by itself give you a useful generative model unless you have a target distribution that is genuinely in the IQP class. Most real data is not in the IQP class — natural images follow distributions that are neither IQP-hard nor IQP-easy in any clean sense. The IQP advantage is therefore most valuable for quantum-grounded targets: distributions arising from quantum experiments, quantum simulations, or other quantum sources. For natural data, the IQP advantage is theoretical leverage, not a practical advantage. This is why most QGAN demonstrations target physics-grounded or synthetic distributions, not natural data.

3. Born machines vs autoregressive models

Classical autoregressive models (like LSTMs, transformers) sample by sequentially generating each output token conditioned on previous tokens. Why is this not as expressive as a born machine?

Show answer

Autoregressive models factor the joint distribution as $p(x_1, x_2, \ldots, x_n) = p(x_1) p(x_2 | x_1) \cdots p(x_n | x_1, \ldots, x_{n-1})$ . Each conditional $p(x_i | x_{<i})$ is computed by a polynomial-time classical neural network. Therefore the entire distribution is polynomially-sampleable classically. Born machines, by definition, can produce distributions that are not polynomially-sampleable classically (in the IQP regime, by complexity-theoretic argument). The structural difference is exactly the sampling-class hardness gap. This does not say born machines are better on natural data — autoregressive models have built-in inductive biases for sequential data that born machines lack — but it does say they have access to a strictly larger model class.

4. When QGANs beat classical GANs in 2026

Hypothetically, you are training a generative model for ground-state samples of a 2D Heisenberg antiferromagnet (a quantum-physics-target distribution). Pick QGAN or classical GAN and justify.

Show answer

For a quantum-physics target, QGAN has structural advantages: (a) the target distribution is naturally expressed as a quantum state, so quantum sampling can match it without classical-to-quantum loading; (b) the entanglement structure of the target is captured by the QGAN’s quantum circuits, while a classical GAN must learn the entanglement implicitly through correlations in samples; (c) the training metric (Hamiltonian expectation, fidelity) is more naturally measured on quantum samples. For this specific target, QGAN should outperform classical GAN at sufficient scale. But the practical state of QGAN training in 2026 limits demonstrations to small system sizes (~20-50 qubits) where classical methods can still simulate the target efficiently. Even with structural advantage, the practical breakeven where QGAN clearly beats classical methods on this kind of target has not been reached. The argument is theoretical leverage, not deployable-product reality. By 2027-2028, with better hardware, this may change.

Where this goes next

This concludes the four-tutorial QML deepening (41-44). The quantum-ml track now has 7 tutorials covering both the empirical (tutorial 17 skeptic’s bench) and theoretical (tutorials 41-44) sides of QML. Combined, they give a complete map of which QML claims are honest and which are aspirational. Future QML tutorials will track the field’s evolving frontier: new dequantization-resistant problem classes, fault-tolerant QML algorithms (HHL with block encoding), and the emerging quantum-data ecosystem from sensors and simulators.

The Born rule as a probabilistic model

The sampling-class advantage

Born machine training

Quantum generative adversarial networks

The IQP born-machine specifically

A small born-machine example

Common misconceptions

Decision rule

Exercises

1. Why classification QML and generative QML have different stories

2. The IQP advantage in practice

3. Born machines vs autoregressive models

4. When QGANs beat classical GANs in 2026

Where this goes next

Quantum, for people who already code.