Movie frames and motion correction

A modern detector records a movie, not a picture — align the frames, dose-weight them, and sum, before anything else can happen

Intuition

At each tilt in a tilt series you think the camera took “a picture” — it didn’t. A modern direct electron detector slices one exposure into dozens of frames and records a movie. The reason is simple: the beam pushes the specimen around (beam-induced motion), the stage itself drifts, and the image is moving the entire time the exposure is open. If you just accumulate all that charge into a single image, the motion is baked in permanently. Record a movie instead, and the motion is left between the frames — motion correction aligns the frames to each other, then sums them into one sharp image. This is the very first step of the pipeline, before tilt-series alignment.

This is the first stage of the Software & data processing pipeline. This page assumes you have just received the raw data the microscope saved, and starts from zero: what that pile of files actually is, why it’s a movie, what motion correction does, and what goes in and comes out.

What a frame actually is

Get the physical picture straight first. A direct electron detector is a sensor that can count electrons one at a time. During the exposure, electrons pass through the specimen and land on it, and it reads out very fast, so one exposure is chopped into $N$ time slices. Each slice is one frame — essentially a 2D pixel image recording how many electrons landed on each pixel during that short interval. Stack those $N$ frames in time order and you have a frame stack, also called a movie. It is usually one multi-page .mrc/.tif file: one tilt, one movie file.

A single frame is exposed for a very short time with very few electrons, so it is so noisy you can barely make out any content — that’s normal, it was never meant to be viewed alone. The useful signal only emerges once you add all the frames together. The catch is that, before naively adding them, you have to clean up the shifts between frames.

Tip

Motion correction also needs a gain reference: each detector pixel has slightly different sensitivity, and the gain reference is a calibration map of “how strongly each pixel responds.” Processing divides it out so this fixed per-pixel pattern isn’t mistaken for specimen structure. It is supplied with the data by the microscope; you don’t measure it yourself.

Why a movie, not an image

Summing the frame stack directly assumes the specimen held perfectly still during the exposure — it didn’t. Two sources move it:

Beam-induced motion: energy deposited by the beam makes the ice and specimen deform and jump, most violently in the first few frames. This is the specimen itself moving.
Stage drift: mechanical and thermal drift slowly translate the whole field of view across the exposure. This is the stage carrying the specimen moving.

Each individual frame is a noisy, low-dose snapshot, but the relative shift between frames can be measured. Measure those shifts well, undo them, then sum — and the result is almost as sharp as if the specimen had never moved. That is exactly why you store a movie: the motion is left between the frames, so there is still a chance to undo it; accumulate into one image up front and the motion is smeared in for good.

What motion correction does

The core operation is frame alignment: estimate each frame’s translation relative to a reference, shift the frames back into register, and sum them into one motion-corrected micrograph. Alignment comes at two granularities, matching the two kinds of motion:

Whole-frame alignment: estimate one global translation vector per frame. This corrects stage drift well — the whole field of view translates together.
Patch-based alignment: divide the field into a grid and estimate a separate time-varying trajectory for each patch, then interpolate a smooth local motion field. This corrects beam-induced motion — different parts of the frame move differently and must be corrected patch by patch.

In practice you use both: whole-frame alignment first to suppress the overall drift, then patch-based alignment to handle the uneven deformation within the frame. Common tools are MotionCor2, IMOD’s alignframes, and the motion correction built into RELION and Warp.

Deeper

The simplest model is a global rigid translation: frame $i$ is the true image shifted by $\mathbf{d}_i$ , and alignment means estimating every $\mathbf{d}_i$ and shifting back before summing. Patch-based alignment divides the field into a grid, estimates a time-varying trajectory for each patch, and interpolates a smooth local motion field.

Note that a single frame’s signal-to-noise is far too low for direct pairwise registration. In practice you iteratively build an improving summed reference and align each frame to it, so the weak per-frame signal is stabilized through global consistency.

Dose fractionation and dose weighting

Splitting the exposure into frames buys a second thing: dose fractionation. The total electron dose is divided across the frames — the first frame received only a small slice of exposure, while the last frame has accumulated the full dose. Radiation damage is cumulative, so later frames carry more damage, and the damage eats high frequencies (fine detail) first.

The intuition for “high frequencies first”: fine detail corresponds to dense, atomic-scale variation in density, the most fragile thing there is, so it is the first to be scrambled by ionization damage; the large-scale outline (low frequencies) is far more robust and survives much more dose largely intact. So whether a frame’s high frequencies are trustworthy depends on how much dose it has accumulated.

If you sum every frame with equal weight, the already-damaged high frequencies in the late frames contaminate the final image. Dose weighting instead applies a weight to each frame and each spatial frequency: early, low-dose frames keep more of their high frequencies, while the high frequencies of late, high-dose frames are suppressed — but those late frames’ low-frequency information is still useful and is kept. The weighting is frequency-dependent, allocated by asking “is this frequency in this frame still trustworthy?”

Step by step

Putting the above into one minimal procedure (run once per tilt — as many times as there are tilts in the series):

Prepare the inputs. The raw frame stack for one tilt (a multi-page .mrc/.tif movie), plus the matching gain reference and the total dose of that exposure (used to compute dose weighting).
Apply the gain reference. Divide out the per-pixel sensitivity differences, removing the detector’s own fixed pattern from the image.
Whole-frame alignment. Estimate each frame’s global translation to suppress stage drift.
Patch-based alignment. Estimate a local motion field on a grid to correct the uneven beam-induced motion within the frame.
Dose-weighted sum. Weight each frame and each frequency by the dose that frame accumulated, then add them into one image.
Get the output. One motion-corrected, dose-weighted 2D image (micrograph) — what this tilt would have looked like if the specimen had never moved.

Run this for every tilt in the series and you get a stack of clean 2D images, exactly the input the next step, tilt-series alignment, needs.

Here is a generic command shape (placeholders in caps — replace them per your data and the tool’s documentation; no specific parameter values are written here):

# Shape only: one tilt's movie -> one motion-corrected, dose-weighted image
motioncor-tool \
  -InMovie   RAW_MOVIE.mrc \
  -Gain      GAIN_REF.mrc \
  -OutImage  MOTIONCORR.mrc \
  -PatchGrid <patches_x> <patches_y> \
  -TotalDose <total dose of this exposure> \
  -DoseWeight

For the actual runnable command and flags, follow the official documentation of the tool you use (MotionCor2 / IMOD / RELION / Warp) — the switch names differ between tools.

Even/odd splitting starts here

The frames also hand us a free source of independence: split the frame stack by parity (or into first and second halves), sum each group separately, and you get two independent half-images of the same exposure. They see the same specimen and the same signal, but their noise realizations are independent. That is the basis for unbiased resolution estimation and for many self-supervised denoising and reconstruction methods (see even/odd splitting) — and all of it depends on the detector having stored a movie rather than a single image.

Where it sits in the pipeline

Motion correction is the frontmost step:

tilt series (a movie at each tilt) → motion correction + dose weighting (each movie → one sharp, weighted image) → tilt-series alignment → tomographic reconstruction → downstream analysis.

Only after each frame’s motion has been cleaned up and the damaged high frequencies have been suppressed does the alignment and reconstruction that follows get a sharp, trustworthy input. The reconstructed volume still carries a missing wedge, but that comes from the sampling geometry and has nothing to do with the intra-frame motion fixed here — they are different ailments at different stages of the pipeline.

Previous pipeline step: pipeline overview · Next: tilt-series alignment. Related: tilt series · even/odd splitting

← Software & Data Processing