Imaging and data acquisition

From sample prep and flash-freezing, through the electron optics and detector, to how a micrograph (or a whole tilt series) is actually collected.

Getting a dataset you can reconstruct from takes roughly three steps: prepare the sample, flash-freeze it, and image it. Every step is governed by one constraint — electrons are both the illumination and the source of damage, so dose is spent extremely sparingly.

Intuition

Think of the whole pipeline as “taking one very dim photograph on a fixed budget.” The budget is the total number of electrons the sample can absorb before it falls apart, measured in electrons per square ångström (e⁻/Å²). You cannot just turn up the brightness like a long film exposure: every electron that lands carries signal and breaks chemical bonds. So every acquisition decision — one shot or a whole series, how much defocus, how many frames — is really the same question: how do you spend that budget to best effect? All the noise, blur, and missing angles downstream trace back to this one budget line.

1. Sample prep and vitrification

The sample is applied to a grid covered in micron-scale holes, blotted with filter paper to leave a film tens to a few hundred nanometres thick, then plunged into liquid ethane and frozen in milliseconds. Ethane conducts heat fast and, unlike liquid nitrogen, does not wrap the sample in an insulating gas film — so the water vitrifies into glass before it can crystallize. Thin, and free of ice crystals, is the precondition for everything that follows.

Why does thin matter so much? Electrons crossing a watery ice slab scatter multiple times, and the thicker the ice, the smaller the fraction of “clean” singly-scattered signal — the image gets foggier. So an ideal film is often only tens of nanometres thick, the depth of one or two protein complexes. Too thick and the signal drowns in scattering; too thin and particles can be squashed out of shape or destroyed at the air–water interface. The thickness uniformity of this film directly sets how much of a grid is actually usable. For why vitrification preserves structure in a near-native state, and why crystalline ice is the enemy, see cryo-EM and vitrification.

2. The electron optics

Depth

A transmission electron microscope is one electron-optical column, top to bottom:

the electron gun (usually a field-emission gun, FEG) emits a coherent beam;
the condenser lenses shape it and illuminate the sample;
the stage holds the grid at liquid-nitrogen temperature and tilts it precisely;
the objective lens forms the image — its defocus and aberrations set the contrast transfer (CTF);
the projector lenses magnify the image onto the detector;
a direct electron detector (DED) counts electrons one by one and records them as a “movie” of frames.

This column follows the same logic as a light microscope — illuminate, focus, image, magnify, record — but with electrons instead of light and magnetic lenses instead of glass ones. The only reason to switch to electrons is resolution: an electron’s de Broglie wavelength is far shorter than visible light, on the order of picometres at the few-hundred-kilovolt accelerating voltages used, so in principle it can resolve down to the atomic scale. The price is that electrons must travel in high vacuum (otherwise air molecules scatter them), which is why the sample has to be frozen into vitreous ice that can survive vacuum and a measure of irradiation in the first place.

The direct electron detector (DED) deserves its own note. It replaced the older CCD/film cameras, and the difference that matters is that it counts individual electrons and records a single exposure as a movie of dozens of frames. Those two properties — very low read noise from counting mode, and the ability to do motion correction after the fact from the frames — are the hardware reason cryo-EM resolution went from “see the outline” to “see the side chains” over the past decade. The low-dose and fractionation in the next section are built on top of it.

3. Low-dose imaging and dose fractionation

To keep radiation damage minimal, the whole session is low-dose: focus on a nearby area, then give the target one very weak exposure. That exposure is split into a movie of dozens of frames — dose fractionation — so beam-induced sample drift can be aligned and added back afterwards (motion correction). For contrast, images are usually taken underfocus, which introduces a CTF that the reconstruction later has to deconvolve.

The numbers make it concrete. A single-particle exposure spends on the order of tens of e⁻/Å² total (depending on the target resolution); split into 40 frames, each frame gets only about 1 e⁻/Å² — a single frame is almost pure noise, with no particle visible to the eye. But that is exactly the point of fractionation: align first, then sum. The sample drifts most violently during the first few electrons of exposure (charge builds up, the support film relaxes); a single long exposure would smear that drift across the image, but with frames you can measure the drift track frame by frame, align them, and only then add them up — the equivalent of steadying the camera before pressing the shutter.

Depth

Why underfocus rather than exact focus? A thin vitreous biological sample barely absorbs electrons; it carries information mostly by shifting the phase of the electron wave, and at exact focus that phase difference turns into almost no visible light-and-dark contrast. Deliberate underfocus adds a defocus phase shift that moves part of the phase information into visible amplitude contrast — at the cost that this shift oscillates with spatial frequency and at some frequencies even flips the contrast (black and white swap). That is where the CTF comes from. More underfocus gives stronger low-frequency contrast (easier to center on and pick particles) but a denser CTF oscillation that suppresses high frequencies sooner. So acquisitions often stagger the defocus across images on purpose: the CTF zeros (information gaps) of one image are filled in by another, so the combined data loses no frequency. The reconstruction has to estimate each image’s defocus first, then deconvolve to undo this modulation.

4. Single shots vs a tilt series

Single-particle: collect thousands of micrographs across the sample, each holding many copies of one molecule in random orientations.
Tomography (cryo-ET): stay on one region and collect a whole tilt series under a tilt scheme (e.g. the dose-symmetric scheme, tilting outward from 0° to both sides), one image per angle. Limited by the stage and the sample thickness, the tilt usually stops near ±60° — and that is the origin of the missing wedge.

The two paths are mirror images in how the dose is spent. Single-particle bets the whole budget on one angle for one shot, and assembles the 3D information computationally from “thousands of copies of one molecule in every orientation.” Each molecule contributes a single view, but the copy count is enormous, so averaging drives the signal-to-noise high. Tomography does the opposite: there is only one object (a slab of cell, an organelle), with no copies to average, so it rotates that object on the stage and takes one image at each of dozens of angles, spreading the same budget thin across the whole series. With total dose $D$ over $N$ images, each angle gets only about $D/N$ , so every projection is extremely noisy. That is why a single tomographic image has far worse signal-to-noise than a single-particle one — it trades “few but unique views” for a “near-native, in-situ, complete scene.”

Tomography also carries an extra debt: the stage cannot tilt past about ±60° (beyond that, the electron path through the ice slab stretches as $1/\cos\theta$ until the sample is too thick to penetrate), so views above that angle are simply never recorded. In Fourier space this missing block is wedge-shaped — the missing wedge — and it leaves the reconstruction inherently blurred in certain directions. Single-particle, with particles in random orientations that statistically fill all angles, in principle has no such gap.

Collecting the data is not the same as having the structure: every image is deeply noisy and CTF-modulated, and the tilt series is missing a whole block of angles. Those fundamental limits are the subject of the next page (fundamental limits) — and what every method on this site sets out to fight.

From raw data to a 3D volume: what happens next

Before the data goes to the reconstruction, a few steps turn “a pile of noisy, individually drifting, individually defocused images” into a usable input. First comes alignment: the stage shifts mechanically across tilts and the sample itself drifts and deforms, so the images never line up; they have to be registered into one coordinate system using fiducials in the field (often gold beads) or image cross-correlation — the job of tilt-series alignment. Second comes per-image CTF estimation: a series has no single defocus value, because once tilted, different positions within one image sit at different heights along the beam, so the focal plane varies and defocus has to be estimated per tilt, sometimes strip by strip. Only after alignment and CTF correction do the projections go into reconstruction, where back-projection or iterative fitting combines them into a 3D volume.

And because the input already carries this triple deficit — low signal-to-noise, CTF modulation, and the missing wedge — reconstruction is never a clean geometric inversion but an underdetermined inverse problem: the same projections can support several different 3D explanations. The statistical and learning-based methods on this site (see CryoGEN and related methods) exist precisely to fill that deficit — using priors and data constraints to infer the most credible structure from the incomplete evidence that was actually recorded.

← Electron Tomography