Quantum Outpost
post quantum crypto advanced · 16 min read ·

Falcon (FN-DSA): The Compact Lattice Signature Standard

Falcon — standardized as FN-DSA in NIST FIPS 206 — is a post-quantum signature scheme built from NTRU lattices and floating-point Gaussian sampling. It produces signatures roughly 5x smaller than ML-DSA at comparable security, but at the cost of a much harder implementation (constant-time Gaussian sampling is notoriously subtle). This tutorial covers the math, the implementation pitfalls, and when Falcon is the right post-quantum signature choice.

Prerequisites: Tutorial 22: ML-KEM and ML-DSA in Practice

NIST’s post-quantum cryptography standardization picked three signature schemes from the final round: ML-DSA (FIPS 204, lattice-based, the default choice), SPHINCS+ (FIPS 205, hash-based, the conservative-stateless backup), and Falcon (FIPS 206, NTRU-lattice-based, the compact-signature option). Tutorial 22 covered ML-DSA in detail. This tutorial covers Falcon — what it does well, where it is hard, and when to pick it over ML-DSA.

The headline number: Falcon signatures are ~5× smaller than ML-DSA at comparable security level. A Falcon-512 signature is 666 bytes; an ML-DSA-44 (Dilithium2) signature is 2,420 bytes. For protocols where signatures are transmitted often or stored long-term, this size difference compounds.

The cost: Falcon’s signing operation requires constant-time Gaussian sampling over discrete lattices, which is one of the trickiest implementation problems in post-quantum cryptography. Naive implementations leak side-channel information that can break the scheme entirely. Several published “constant-time Falcon” implementations have had subtle bugs that leaked the secret key.

This tutorial covers Falcon’s math (NTRU lattices, the trapdoor sampling), the implementation pitfalls, the security argument, and a decision rule for picking Falcon vs ML-DSA in real protocols.

NTRU lattices, briefly

Falcon is built on the NTRU lattice problem. Roughly: given a polynomial ring R=Z[X]/(Xn+1)R = \mathbb{Z}[X]/(X^n + 1) with nn a power of 2, and a public polynomial hR/qh \in R/q for some prime qq, the NTRU problem is to find polynomials f,gRf, g \in R with small coefficients such that hg/f(modq)h \equiv g/f \pmod q.

The “small coefficients” condition is what makes the problem hard. Computing g/fg/f for any f,gf, g is easy modular arithmetic; finding the small representative is a lattice-shortest-vector problem and is believed to be quantum-hard.

Falcon’s key generation:

  1. Sample small polynomials f,gRf, g \in R from an appropriate Gaussian distribution.
  2. Compute h=gf1(modq)h = g \cdot f^{-1} \pmod{q}. This is the public key.
  3. Compute additional polynomials F,GF, G such that fGgF=qf G - g F = q (the NTRU equation). The pair (f,g,F,G)(f, g, F, G) is the secret key (a basis of a special lattice).

The public key is one polynomial; the secret key is a basis of an NTRU lattice. The mathematical structure of (f,g,F,G)(f, g, F, G) gives a trapdoor — a specific algorithm that uses these polynomials to find short lattice vectors near any target, without solving the general NTRU problem.

How Falcon signs

To sign a message mm:

  1. Hash mm to a target point cc in the lattice Z2n\mathbb{Z}^{2n} (specifically, the NTRU lattice’s dual).
  2. Use the trapdoor (the secret key) to find a short vector ss such that scs \approx c in the lattice.
  3. The signature is ss encoded compactly.

To verify:

  1. Recompute the hash target cc from the message and public key.
  2. Check that ss is a short vector consistent with cc in the public NTRU lattice.

The math is elegant. The implementation is hard.

The Gaussian sampling problem

The “find a short vector near cc” step requires sampling from a discrete Gaussian distribution over the lattice. Each step involves:

  1. Computing a target offset.
  2. Sampling integer-valued Gaussian coefficients (specifically, samples from a discrete Gaussian on Z\mathbb{Z} centered at a real number, with a specific variance).
  3. Combining these into a lattice point near the target.

The challenge: the Gaussian sampling must be constant-time. Side-channel attacks can extract the secret key from timing variations in sampling. A constant-time discrete Gaussian sampler is non-trivial — naive rejection-sampling implementations leak through rejection counts.

Falcon’s reference implementation uses a “tree” of Gaussian samplers and floating-point arithmetic to achieve constant-time behavior. The implementation is around 4,000 lines of careful C, and several published bugs have shown how easy it is to get wrong.

Concrete sizes

Here are the parameter sizes for the standardized variants:

SchemePublic keySecret keySignatureSecurity
Falcon-512897 B1,281 B666 BNIST level 1 (~AES-128 quantum)
Falcon-10241,793 B2,305 B1,280 BNIST level 5 (~AES-256 quantum)
ML-DSA-441,312 B2,560 B2,420 BNIST level 2
ML-DSA-651,952 B4,032 B3,309 BNIST level 3
ML-DSA-872,592 B4,896 B4,627 BNIST level 5
SPHINCS+-128s32 B64 B7,856 BNIST level 1

Two takeaways:

  • Falcon has the smallest signatures of any post-quantum scheme (at comparable security levels). This is its competitive advantage.
  • Public keys are similar across lattice schemes — small enough for most applications, large compared to elliptic-curve schemes (32 bytes for Ed25519).

For protocols where signatures are sent over the wire (TLS handshakes, blockchain transactions, code-signing), Falcon’s compactness is meaningful. For protocols where signatures are computed offline and stored, the size advantage is less critical.

Speed comparison

Performance varies substantially across implementations. Reference numbers (from the SUPERCOP benchmark suite, 2025-vintage):

SchemeSign (cycles)Verify (cycles)Signature size
Falcon-512~28M~76K666 B
ML-DSA-44~600K~520K2,420 B

ML-DSA signs ~50× faster than Falcon. Falcon’s verify is faster than ML-DSA’s verify. The asymmetry: Falcon’s complexity is concentrated in signing (Gaussian sampling), while ML-DSA spreads it more evenly.

For high-throughput signing scenarios (a server signing many messages per second), ML-DSA is often the right choice. For verification-heavy workloads with rare signing (TLS clients, embedded devices verifying firmware), Falcon’s compact signature can be worth the slow signing.

Implementation pitfalls

Falcon’s reference implementation has had multiple security disclosures, all related to the Gaussian sampling:

  • Floating-point determinism issues. The reference implementation uses double-precision floating-point for the sampling tree. Different platforms produce slightly different results, and in some cases the timing varies enough to leak information about secret-key components.
  • Side-channel leakage in modular reduction. Reducing intermediate polynomial coefficients modulo qq can leak through timing if not done in constant time.
  • Incorrect rejection bounds. The Gaussian sampler must reject some samples to maintain the correct distribution. Bugs in rejection logic have appeared in multiple implementations.

The general lesson: Falcon is much harder to implement correctly than ML-DSA. The cryptographic community now generally recommends Falcon only when the size advantage is critical, and only with carefully audited implementations.

The 2025 NIST guidance: use ML-DSA as the default, fall back to SPHINCS+ for long-term-stable hash-based security, and use Falcon when signature size is the binding constraint and a high-quality vetted implementation is available.

Decision rule

Use Falcon when:

  1. Signature size is critical. Bandwidth-constrained protocols (resource-constrained IoT, blockchain transactions, satellite communications) where every byte matters.
  2. Signing happens offline. If signing is rare and verification is frequent, Falcon’s slow signing is amortized.
  3. You have access to a high-quality, audited Falcon implementation. Rolling your own is dangerous; there are now several vetted libraries (PQClean, liboqs).

Use ML-DSA when:

  1. Signing throughput matters. Server signing many messages per second.
  2. You need the simplest, safest implementation. ML-DSA is structurally easier to implement constant-time; the implementation maturity is now ~3 years and mature.
  3. Default cryptographic policy is conservative. ML-DSA is the NIST default for a reason.

Use SPHINCS+ when:

  1. You need conservative hash-based security. No lattice assumptions, only hash-function security.
  2. Signature size doesn’t matter. SPHINCS+ has the largest signatures by far (~8 KB).
  3. You expect long-term security across decades. Hash-function security is structurally more conservative than lattice problems for very long horizons.

The 2026 production picture: ~80% of new post-quantum deployments use ML-DSA; ~15% use Falcon for size-critical applications; ~5% use SPHINCS+ for stateless conservative use cases.

Common misconceptions

“Falcon is more secure than ML-DSA because it has smaller signatures.” No. Smaller signatures mean a more compact mathematical structure, not stronger security. Both schemes target the same NIST security level for their respective parameter sets. Choose based on the right tradeoff for your use case, not on signature size as a security proxy.

“Constant-time Gaussian sampling is solved.” It is well-studied but easy to get wrong in implementation. The 2025 PQShield reference implementations are well-audited; older reference code is not. Use a current, audited library.

“Falcon’s NTRU lattice is broken.” No fundamental break has been published. There are subexponential-time classical algorithms for some NTRU problems with structured parameters, but Falcon’s parameters were chosen to avoid these. As of 2026, Falcon’s mathematical security is intact. The vulnerabilities have been in implementation, not in the underlying problem.

“Falcon can replace ML-DSA in any protocol.” It can replace ML-DSA, but the slow signing and implementation difficulty matter. Most TLS deployments use ML-DSA; some specialized blockchain protocols use Falcon for the signature compactness.

Exercises

1. Why Falcon signatures are smaller

Compare the structure of a Falcon signature (a short lattice vector) and an ML-DSA signature (a vector of polynomial coefficients mod qq plus auxiliary data). Why is Falcon’s representation more compact?

Show answer

A Falcon signature is a short lattice vector that can be represented by its non-zero coefficients (which are typically small integers). The lattice has structure that allows efficient encoding. ML-DSA’s signature is a polynomial vector with fewer compactness optimizations and additional auxiliary data (a hash of the message reduces and a “challenge polynomial”). Falcon’s NTRU-lattice structure has more “compressible” output; ML-DSA’s structure trades compression for implementation simplicity. The size factor is roughly 5× — meaningful for bandwidth-constrained applications.

2. The constant-time challenge

Why is constant-time discrete Gaussian sampling especially difficult compared to constant-time uniform sampling?

Show answer

Uniform sampling over {0,1,,M1}\{0, 1, \ldots, M-1\} is straightforward: generate uniform integers, take modulo (or rejection-sample if bias matters). Constant-time uniform sampling is well-understood and library-supported. Discrete Gaussian sampling, by contrast, requires (a) computing the probability of each integer in the support according to the Gaussian PDF, (b) sampling proportional to those probabilities. Both steps involve floating-point arithmetic or table lookups that can leak through timing. Floating-point operations have data-dependent timing on most architectures, table lookups have data-dependent cache behavior, and any conditional branching to handle edge cases adds more leakage vectors. The accumulated complexity is what makes Falcon’s constant-time implementation an active research area rather than a solved problem.

3. When Falcon’s slow signing matters

A blockchain protocol expects nodes to sign ~100 transactions per second. Compute the per-server signing throughput required, and decide whether Falcon’s ~28M-cycle signing is feasible.

Show answer

100 signatures/sec × 28M cycles/sig = 2.8×1092.8 \times 10^9 cycles/sec required. A modern 3 GHz CPU has 3×1093 \times 10^9 cycles/sec available. One core dedicated to signing is at the edge of feasibility for Falcon at this throughput. Multi-core or batch signing helps but adds complexity. ML-DSA at ~600K cycles/sig requires only 6×1076 \times 10^7 cycles/sec — a tiny fraction of one core. For a 100-sig/sec blockchain node, ML-DSA is the clearly better choice from a server-throughput perspective; Falcon’s signature compactness is offset by the throughput cost. Falcon would be a better fit for a system where signature size is more constrained (small device transmitting over a narrow band).

4. Picking the right post-quantum signature for embedded firmware

A firmware signing system signs firmware images once per release (~weekly) and verifies them on millions of devices. Each device has 32 KB of RAM and limited CPU. Pick a signature scheme.

Show answer

Constraints: rare signing (weekly), frequent verification (each device on every boot), tight memory. Falcon-512 is the clear choice. The slow signing is amortized over a week; the compact signature (666 bytes) fits in tight memory; Falcon verification is fast enough for embedded CPUs. ML-DSA-44 would also work but uses ~3.5x more memory for the signature, which matters in 32 KB RAM. SPHINCS+ is too large (8 KB signature). Falcon is the right choice exactly when the bandwidth/memory constraint is the binding one and signing is rare. This is the canonical “Falcon use case” and explains why it is included in the NIST standards despite being harder to implement.

Where this goes next

Tutorial 50 covers SPHINCS+ — the hash-based signature alternative that doesn’t depend on lattice or NTRU assumptions, providing a structurally different conservative-security path.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.