Signal-to-noise ratio & electron dose

Why Cryo-ET images are dominated by noise, how dose limits the signal, and how averaging recovers structure

A raw Cryo-ET image looks like almost pure snow — the real structure is buried in it. Whether you can see it comes down to two things: how strong the signal is relative to the noise, and how many images you can stack and average.

Intuition

Think of imaging as trying to hear one person whisper a sentence in a noisy room. Catch a few words once and you make out almost nothing — the background drowns the voice out. But have the same sentence repeated thousands of times, align each repetition, and stack them: the voice says the same thing every time (signal adds up coherently) while the background differs every time (noise partly cancels). The more you stack, the clearer the sentence. Cryo-ET is exactly this situation — the signal in one image is too weak to see, but the structure is fixed and the noise is random, so “align and stack” slowly pulls the structure out from under the noise.

What SNR is

The signal-to-noise ratio (SNR) measures how much of a measurement is true signal versus random fluctuation. For a signal of amplitude (standard deviation) $\sigma_s$ corrupted by additive noise of standard deviation $\sigma_n$ ,

\text{SNR} = \frac{\sigma_s}{\sigma_n},

often reported in decibels as $20\log_{10}(\text{SNR})$ . Reading it symbol by symbol: $\sigma_s$ is the size of the signal’s fluctuations (how much the real features in the image actually vary in brightness), $\sigma_n$ is the size of the noise fluctuations (how much the same spot jitters in brightness from random causes), and their ratio tells you how many times larger the “real thing” is than the “junk.” $\text{SNR}=1$ means signal and noise are equal; $\text{SNR}=10$ (i.e. $20\,\text{dB}$ ) means the signal is ten times the noise.

In Cryo-ET the SNR of a raw image is extremely low, frequently well below one — measured values often sit in the $0.01$ to $0.1$ range, meaning the noise is ten to a hundred times larger than the signal. So individual features are not visible by eye and structure emerges only after extensive processing. This is a different world from everyday photography (SNR usually well above one, plain to see), and it is why almost the entire Cryo-ET pipeline is organized around one question: how to extract signal from an extremely low SNR.

How averaging drives the noise down — vary the number averaged N:

Single measurementTrue signalAverage of N

Number averagedN = 16　· √N = 4.0×

1256

The true signal is fixed; each measurement adds independent noise. Averaging N of them leaves the signal untouched while the noise standard deviation shrinks by √N, so SNR grows as √N. This is exactly why subtomogram averaging recovers structure from images that are individually almost pure noise.

Why dose is the root cause

The fundamental cause is radiation damage. Biological specimens are destroyed by the electron beam, so the total electron dose must be kept low — a tilt series spreads a limited budget, often a few tens of electrons per square ångström, across all of its images. For a 41-image tilt series with a total budget of $\sim 100\ e^-/\text{Å}^2$ , that leaves only about $2.5\ e^-/\text{Å}^2$ per image — each image is severely underexposed and very noisy on its own. Electron detection is a counting process, so the number of electrons $N$ recorded per pixel follows Poisson statistics: the signal scales as $N$ while the noise (standard deviation) scales as $\sqrt{N}$ , giving

\text{SNR} \propto \frac{N}{\sqrt{N}} = \sqrt{N}.

Here $N$ is the electron count recorded at a given pixel (proportional to the dose deposited there). A property of the Poisson distribution is that its variance equals its mean, so the standard deviation of the fluctuations is $\sqrt{N}$ — that is where the $\sqrt{N}$ in the noise comes from.

Deep dive

Why Poisson, and why does SNR grow only as $\sqrt N$ ? Picture imaging as “throwing electrons” into each pixel: the electrons arriving at a pixel during an exposure are independent rare events, and such counting processes follow the Poisson distribution $P(k)=\frac{N^k e^{-N}}{k!}$ , whose mean and variance both equal $N$ . The relative fluctuation of a single measurement is therefore $\sqrt{N}/N = 1/\sqrt{N}$ — more counts means smaller relative noise, but only at a square-root rate. This is the universal law of shot noise; it has nothing to do with detector quality and is the statistical limit of electron counting itself.

From this comes the central tension of Cryo-ET: doubling the SNR requires quadrupling the dose (doubling $\sqrt{N}$ means $N \times 4$ ), while radiation damage accumulates roughly linearly with dose. One curve grows as a square root, the other linearly, so they must cross at some dose — beyond it, the structural destruction from damage outweighs the marginal SNR gain, and adding dose is a net loss. This is why the problem cannot be solved by “firing more electrons” and must instead be sidestepped by “averaging more copies.” In practice the total dose is collected as a series of frames (a dose-fractionated exposure) and the frames are combined with dose weighting: early frames retain undamaged high frequencies and enter at full weight, late frames have lost their high frequencies and contribute only at low frequency — preserving high-frequency information without discarding low-frequency signal.

SNR therefore grows only as the square root of dose, while damage accumulates roughly linearly. This tension defines the central trade-off of Cryo-ET: more dose buys cleaner images but degrades the very structure being measured. The way out is not in any single image but in the number of copies — the $\sqrt{M}$ law below.

Why averaging works

Intuition

A single image is mostly noise, but the noise is random while the signal is fixed. Adding $M$ aligned, independent copies multiplies the signal by $M$ and the noise standard deviation by only $\sqrt{M}$ , so averaging raises SNR by $\sqrt{M}$ . This is precisely why thousands of copies of the same molecule are aligned and combined.

Plug in numbers to see how decisive this gain is: if a single image has $\text{SNR}\approx 0.05$ , reaching a visible level (about $\text{SNR}\approx 5$ ) means raising SNR a hundredfold, i.e. $\sqrt{M}=100$ , so $M=10{,}000$ copies. That is why subtomogram averaging routinely combines tens of thousands of particles — not engineering convenience, but the copy count the $\sqrt{M}$ law hard-requires. Conversely, the square root in $\sqrt{M}$ means diminishing returns: going from $10{,}000$ to $40{,}000$ copies only doubles the SNR again, so every further step in resolution costs a multiplicative increase in particles.

The two preconditions “aligned and independent” deserve emphasis. Independent: the noise in each copy must be uncorrelated, or it will not cancel on summation (in the extreme, identical noise gives no averaging gain at all). Aligned: the signal must first be registered to a common pose before summing, or the signal blurs against itself — alignment error directly eats into the $\sqrt{M}$ gain you should have received, which is why pose estimation (angles and shifts) in subtomogram averaging has to be so accurate.

SNR falls with frequency

SNR is not a single number; it varies with spatial frequency. Because the contrast transfer function oscillates through zeros and because dose-dependent radiation damage attenuates high frequencies most severely, the finer the feature, the lower its SNR. Treating SNR as a function of frequency $\text{SNR}(k)$ : at low frequency (coarse outlines) the signal is strong and SNR may be tolerable, while at high frequency (fine detail) SNR drops below one quickly.

This gives an operational definition of resolution: the effective resolution of a reconstruction is roughly set by the frequency at which $\text{SNR}(k)$ falls to order one — beyond it the signal is buried in noise and adding frequencies adds only noise. In practice this per-frequency correlation is measured with Fourier shell correlation (FSC) between two half-datasets, and the frequency where the FSC curve crosses its threshold is reported as the resolution. So “improving resolution,” in the language of SNR, means “pushing the frequency at which $\text{SNR}(k)$ reaches one to higher frequency,” and the two routes for pushing it are the two method families below.

Connection to reconstruction

There are two complementary routes for pushing usable signal toward that limit. The first is noise suppression by filtering: reweight each frequency by $\text{SNR}(k)$ (this is exactly what a Wiener filter does), keeping more where the signal is strong and suppressing where noise dominates, raising overall visibility — but it can only rearrange existing signal, never create information lost at the zeros or erased by damage. The second is learned restoration by generative methods such as CryoGEN: write the CTF (frequency-direction degradation), the missing wedge (angle-direction degradation), and the low SNR together into the imaging model, and use a learned structural prior to fill in what filtering alone cannot. Both serve the same goal — push the frequency at which $\text{SNR}(k)$ reaches one higher, which is the same as pushing resolution finer. And the most basic statistical law, $\sqrt{M}$ , is what lets subtomogram averaging recover high-resolution structure from copies that are individually buried in noise.

← Signal Processing