The Fourier transform & frequency domain

A decomposition of a signal into sinusoids that turns convolution into multiplication and underpins nearly every step of Cryo-ET processing

The Fourier transform breaks a signal into the pure frequencies that sum to it, much as a chord resolves into the individual notes it contains. Each note has a pitch (its frequency) and a loudness (its amplitude), and adding them back in the right proportions reconstructs the original signal; the transform is exactly that recipe. The same decomposition holds for images and tomographic volumes, except the “notes” become stripes that ripple across space.

This recipe is worth keeping because the same signal looks completely different in the two coordinate systems yet carries identical information. In real space you read values by position — this pixel is brighter, that edge is darker. In the frequency domain you read values by scale — how much slow light-to-dark variation the whole image contains, how much fine texture. Many operations that are tangled together and awkward to apply selectively in real space become independent and non-interfering once moved to the frequency domain. The imaging and reconstruction chain of Cryo-ET leans on this almost everywhere, so the frequency domain is not an optional trick but the field’s default language.

The Fourier transform expresses a signal as a superposition of complex sinusoids, replacing a function of position with a function of spatial frequency. For a continuous signal $f(x)$ the transform and its inverse are

F(k) = \int_{-\infty}^{\infty} f(x)\, e^{-2\pi i k x}\, dx, \qquad f(x) = \int_{-\infty}^{\infty} F(k)\, e^{\,2\pi i k x}\, dk,

where $k$ is spatial frequency (cycles per unit length). The complex value $F(k)$ encodes both an amplitude $|F(k)|$ and a phase $\arg F(k)$ at each frequency. Low frequencies describe slow, large-scale variation; high frequencies describe fine detail and sharp edges. In two and three dimensions the transform generalizes by integrating against $e^{-2\pi i\,\mathbf{k}\cdot\mathbf{x}}$ , so an image or a tomographic volume has a corresponding amplitude-and-phase spectrum.

Reading the integral term by term makes it concrete. $f(x)$ is the signal under analysis; $e^{-2\pi i k x}$ is a “probe” sinusoid of frequency $k$ ; multiplying the two and integrating over all $x$ asks “how much of this one frequency does the signal contain?” If $f$ genuinely oscillates at that frequency, the product accumulates in phase and gives a large $F(k)$ ; if not, positive and negative parts cancel and the value sits near zero. The inverse transform treats these complex coefficients $F(k)$ as weights and superposes all frequencies back together, recovering $f(x)$ exactly. The two formulas are mirror images, differing only in the sign of the exponent — which is what guarantees the transform loses no information.

The division of labor between amplitude and phase deserves its own emphasis, because it recurs throughout denoising and reconstruction. The amplitude $|F(k)|$ says “how much energy lives at each scale,” while the phase $\arg F(k)$ says “where the crests of those waves line up.” A classic demonstration: take the amplitude spectrum of one image and the phase spectrum of another, transform back, and the result looks far more like the image that supplied the phase — meaning an object’s outline and structure live mostly in the phase. This is why the microscope’s corruption of phase (the CTF, below) is so damaging, and why many reconstruction methods treat recovering the correct phase as more urgent than recovering the correct amplitude.

Three properties make the frequency domain central to Cryo-ET. First, the transform is linear, so processing each frequency independently is well defined. Second, the convolution theorem states that convolution in real space equals multiplication in frequency space,

(f * g)(x) \;\xleftrightarrow{\;\mathcal{F}\;}\; F(k)\,G(k),

which converts the action of the microscope and of linear filters into a simple per-frequency weighting. Third, the energy of a signal is preserved between domains (Parseval’s theorem), so spectral magnitudes carry direct physical meaning.

Depth

The convolution theorem is the technical core of all this. Convolution in real space, $(f*g)(x)=\int f(\tau)\,g(x-\tau)\,d\tau$ , is an expensive point-by-point overlap: computing $N$ output points, each sweeping $N$ neighbors, costs $O(N^2)$ naively. The frequency domain reduces it to an elementwise product $F(k)\,G(k)$ , essentially free. This equivalence underwrites two things. The first is computation: imaging, smoothing, sharpening, band-passing — any linear, shift-invariant operation is convolution by some kernel $g$ , so every one of them becomes “FFT, multiply by a spectrum, inverse FFT.” The second is modeling: a real microscope acts on the specimen by convolution with a point-spread function $g$ , so the observed spectrum equals the true spectrum times a transfer function $G(k)$ . Read backward, this is the idea of deconvolution — knowing $G(k)$ , one could in principle recover the truth as $F_\text{obs}(k)/G(k)$ . But wherever $G(k)$ approaches zero (and the CTF’s zero crossings do exactly this), noise dominates and the division cannot be undone; the missing information has to be supplied by a prior or by complementary tilt angles instead.

Intuition

A sharp edge or a point of high contrast is built from many frequencies added in phase. Blur removes the high-frequency terms; the surviving low frequencies still outline the object but lose its fine structure. The spectrum is a complete, reversible bookkeeping of how much of each scale a signal contains.

A square wave is built from its odd harmonics — more terms approach it, yet the jumps always keep an overshoot:

Partial sumTarget square wave

Harmonics: 4

Partial sum of 4 sine harmonics. More harmonics approach the square wave, but the overshoot at the jumps (the Gibbs phenomenon) never vanishes — it only narrows.

The demo also illustrates a commonly misread effect: the overshoot at the jumps (the Gibbs phenomenon) does not vanish as harmonics are added. Its height stays fixed at about 9%, merely squeezing into a narrower band. The lesson is that approximating a signal with sharp boundaries by a finite set of frequencies always leaves ringing near those boundaries — in Cryo-ET, aggressively truncating high frequencies or hard-zeroing a band often shows up as exactly this kind of fringe artifact.

Because a measured image lives on a discrete grid, practical work uses the discrete Fourier transform, computed by the fast Fourier transform (FFT) in $O(N\log N)$ time. Discretization carries a cost too: moving a continuous signal onto a finite pixel grid caps the frequencies it can hold at an upper limit, which is the origin of the sampling limit; the grid’s finite extent also makes the spectrum itself discrete, so numerical convolutions carry a periodic wrap-around assumption that needs care at the edges.

The frequency-domain viewpoint frames almost every later topic: the microscope’s action is a contrast transfer function that multiplies the spectrum, applying the convolution theorem to real imaging physics; finite pixels impose a sampling limit; the point spread and convolution become a simple per-frequency weighting; noise suppression is performed by filtering the spectrum; and the signal-to-noise ratio is read scale by scale in the frequency domain, telling us which frequencies to trust. The decisive step is reconstruction: the central-slice theorem says that the 2D Fourier transform of each projection is exactly one slice through the object’s 3D spectrum, taken through the origin — so transforming many tilted projections, assembling them into a full 3D spectrum, and inverting it reconstructs the object. The region of frequency space that no slice covers, left empty by the limited tilt range, is the Cryo-ET missing wedge, and it is the very thing methods like CryoGEN and CryoWGEN set out to fill.

← Signal Processing