Energy-based models
Probabilistic models that assign an unnormalized energy to every configuration, defining a density through the Boltzmann form.
To say “which are more likely,” the direct route is to write down a density . But a density must be nonnegative everywhere and integrate to one, and those two constraints make it awkward to design a density by hand. Energy-based models start elsewhere: first give every configuration a score — its energy , lower meaning more favorable — then mechanically turn scores into probabilities. The modeler only has to reason about relative preference; the formula handles normalization. The rest of this page shows how that conversion works, where its cost lies, and why a “compare, never normalize” structure fits Cryo-ET reconstruction so well.
An energy-based model (EBM) defines a probability density over configurations through a scalar energy function . Here is the object being modeled (in Cryo-ET, a 3D density volume) and are the parameters of the energy function itself (for instance the weights of a neural network). Low energy corresponds to high probability:
Lower energy means higher probability: the two well bottoms become the two peaks of the density. Higher T flattens exp(−E/T), pushing the density toward uniform and erasing the contrast between the wells.
The local minima of the energy function are the modes of the density, and a deeper well concentrates more probability mass at that location. A temperature parameter rescales the exponential family as a whole: at low temperature the density sharpens and nearly all mass collects in the deepest well, while at high temperature it flattens toward uniform as configurations approach equal probability. The partition function supplies only the overall normalization and leaves the shape of the density unchanged.
Symbol by symbol: is the probability density of configuration ; is the scalar energy assigned to it; maps “low energy” monotonically to “large weight”; and is the normalizing constant obtained by summing (integrating) over all possible , which guarantees . Note the minus sign in the exponent — it is what makes the lowest-energy configuration the most probable, matching the physical intuition that particles settle into low-energy states.
This is the Boltzmann–Gibbs form. The model places no architectural constraint on — any function mapping to a real number induces a valid density once normalized — which makes EBMs an extremely flexible family. A minimal example: take and the formula recovers a Gaussian with mean and variance , with available in closed form. The Gaussian is “easy” precisely because its energy is quadratic; replace with a deep network and the density can have arbitrarily many wells of arbitrary shape — with the entire cost hidden inside .
The difficulty is the partition function . For high-dimensional the integral is intractable — a voxel volume is roughly two million dimensions, and there is no closed form or feasible numerical sweep over it — so cannot be evaluated in closed form, and neither can the likelihood. Practical use of EBMs is built around the observation that many quantities of interest do not require .
The energy only ever matters by comparison. The probability ratio between two states is
in which cancels. For instance, if , then is times as probable as — a conclusion that never touches the intractable . Sampling, ranking, and gradient-based search all depend on energy differences, not absolute probabilities. Equivalently, shifting the whole energy by a constant, , changes no distribution at all, because is absorbed identically in both the ratio and in .
The gradient of the log-density with respect to , the score, is also free of :
The derivation is one line: , and since does not depend on its gradient is zero, leaving only . The score is a vector field that at each points in the direction of steepest increase in probability — that is, steepest decrease in energy. This is what makes gradient-based samplers such as Langevin dynamics compatible with EBMs: they only ever need to ask the energy function “which way is energy lower from here,” exactly the quantity backpropagation returns.
Maximum-likelihood training requires , an expectation under the model that demands sampling from . Substituting it into the gradient of the log-likelihood gives an intuitive tug-of-war,
where the first term pushes the energy of data points down and the second pushes up the energy of the model’s own samples; the two balance when the model distribution matches the data. The hard part is the expectation in the second term — it requires sampling from , which is the partition-function problem wearing a different mask. Contrastive divergence approximates this expectation with short Markov chains initialized at the data, trading bias for affordable compute. Alternatives sidestep entirely: score matching never touches the likelihood and instead fits the model score to the data score, an objective in which only appears and so is independent of ; and noise-contrastive estimation turns density estimation into a binary classification problem against a known noise distribution, letting a classifier learn the unnormalized density as a byproduct of separating “real data” from “noise.”
EBMs supply the probabilistic backbone for several reconstruction methods on this site: an energy prior encodes which 3D structures are plausible without ever evaluating — smooth, connected volumes consistent with known biology get low energy, while fractured or artifact-laden volumes get high energy. Reconstruction multiplies this prior by the data likelihood, and CryoGEN takes exactly this route: CryoGEN-I finds a single lowest point on the energy landscape (a MAP point estimate), while CryoWGEN does not stop at one point but instead samples a family of solutions from that posterior — its inner loop being precisely the Langevin sampling above, which needs only the score . The same “compare, never normalize” idea appears in optimal transport and in the Gibbs couplings of entropic transport, where an unnormalized exponential weight is likewise the central modeling object.