The surface code

If you follow any of the big quantum computing roadmaps — Google’s, IBM’s, AWS’s, PsiQuantum’s — you’ll notice they all converge on one specific error-correcting code: the surface code. Despite having worse asymptotic overhead than some alternatives, the surface code has won the hardware race for three concrete reasons:

Its error threshold is unusually high — around $1\%$ per gate — which is the easiest target for current physical qubits to beat.
It requires only nearest-neighbor interactions on a 2D lattice, which is natural for chip-based architectures.
Its syndrome measurements are simple enough to implement as repeated cycles without compounding errors too quickly.

If quantum error correction succeeds in bringing a fault-tolerant quantum computer into existence, the surface code is almost certainly how.

The idea, visually

Picture a 2D grid of physical qubits. The qubits sit on the edges of a square lattice. At each vertex of the lattice, you measure the product of $X$ operators on the four edges meeting at that vertex. At each face, you measure the product of $Z$ operators on the four edges surrounding that face.

Those are the syndrome operators. They commute with each other and with the logical operators, so measuring them repeatedly does not disturb the logical qubit.

Errors on a single physical qubit flip the syndrome values at the two adjacent stabilizers. The pattern of flipped syndromes across the whole lattice forms an error graph. A classical decoder (typically a minimum-weight perfect matching algorithm) finds the most-likely chain of errors producing the observed syndromes and outputs a correction.

The key properties:

Locality. Every measurement involves only four physical qubits (the four edges at a vertex or around a face). No long-range interactions needed.
2D hardware layout. The whole code fits on a flat chip. This matches how superconducting and photonic qubits are actually built.
Repeatable. Syndrome cycles can be repeated many times to build up confidence in where errors occurred.

Scaling: how to get a better logical qubit

In the surface code, you make a logical qubit more reliable by making the lattice larger. A $d \times d$ lattice (where $d$ is the “code distance”) can correct any set of errors of weight up to $\lfloor(d-1)/2\rfloor$ . Larger $d$ means:

More physical qubits per logical qubit (overhead scales as $d^2$ ).
Exponentially suppressed logical error rate (as long as physical error rate is below threshold).

To get a logical error rate of $10^{-12}$ per gate (enough to run Shor on a 2048-bit integer), estimates suggest you need $d \approx 25$ – $35$ , which means around $1000$ physical qubits per logical qubit. And you need several thousand logical qubits for the factoring algorithm to actually run. Total: roughly 20 million physical qubits.

That is, at time of writing, well beyond what any existing hardware can produce — current chips are in the 1000-qubit range, and most of those aren’t of high enough quality. But the roadmap is clear: every major player is targeting million-qubit chips over the next 5–15 years, and the surface code is what they plan to run on them.

Logical operations

The surface code supports logical gates via a combination of:

Transversal gates — gates that are implemented by applying the same single-qubit gate to each physical qubit of the logical qubit. Clifford gates (H, S, CNOT) are transversal in the surface code, making them fast and easy.
Magic state distillation — for the non-Clifford $T$ gate, which is not transversal, you prepare special “magic states” using extra overhead and consume them to implement $T$ . This is expensive but necessary for universality (recall Lesson 3.3: the Clifford gates alone aren’t universal; you need $T$ ).
Lattice surgery — a technique for merging and splitting surface-code patches to implement multi-qubit gates between logical qubits living in different regions of the 2D lattice.

Magic state distillation is the main bottleneck. Current estimates suggest that 90% or more of the physical qubits in a fault-tolerant machine will be dedicated to producing magic states for $T$ -gate implementation. This is why “low T-count” is a headline metric for fault-tolerant quantum algorithms.

Where we actually are

As of 2023–2024, multiple labs have demonstrated:

Google Quantum AI (2023): a distance-5 surface code with logical error rate lower than a distance-3 surface code. First direct evidence that the code was scaling as expected.
IBM (2023): logical qubits built from CSS codes on heavy-hex lattice geometries (a variant tailored to their fixed-frequency qubits).
Quantinuum (2024): logical CCZ gates in trapped-ion systems.
AWS (2024): bosonic error-correction techniques with inherent redundancy.

None of these are “fault-tolerant quantum computers yet” in the sense of running Shor on a useful integer. But they are the first steps past the threshold, and they’re being built with the surface code (or close cousins) in mind.

Quick check

What's the error threshold of the surface code, and why is it important?

Quick check

Why is the surface code favored by hardware teams over theoretically better codes?

Quick check

What is 'magic state distillation'?

Module done — and where error correction sits

You now have the full arc: noise is real, single-qubit errors can be digitized into X and Z types, repetition codes handle one type, concatenated codes handle both, and the surface code is the practical workhorse. The quantum threshold theorem says scalable fault-tolerant computing is possible in principle. The engineering challenge is building it.

Module 9 briefly covers how real qubits are actually built — the physical substrates everything in this course is supposed to run on.