Bayesian inference
Describing unknowns with probability and updating a prior into a posterior from observed data via Bayes' rule.
Bayesian inference treats every unknown as a random variable and represents uncertainty about it with a probability distribution. Let denote the parameters or latent state to be estimated and the observed data. A model specifies two ingredients: a prior that encodes belief about before observation, and a likelihood that describes how the data are generated given .
Picture inference as spreading belief across every possible truth. You start by distributing belief according to the prior — some values of look more reasonable and get more weight. Each observation then asks, “if the truth were this , how likely is the data I just saw?”, and reweights accordingly. Values of consistent with the data gain weight, those that contradict it lose weight. The reweighted, renormalized distribution is the posterior. The process never picks a single answer; it reshapes the whole belief from the prior’s shape into the posterior’s.
A directly Cryo-ET-relevant example — only one coordinate of a 2-D structure is measured, the other unobserved like the missing wedge. Watch how prior and likelihood combine into the posterior, and how the MAP point estimate differs from the full posterior:
wide posterior — uncertain, a family
Bayesian inference writes recovering a clean structure from a noisy observation as one update: the prior p(x) (amber, the energy prior — what structures are plausible) times the likelihood (blue, what this observation says) gives the posterior (purple, the updated belief). The MAP (amber tick) is the posterior's peak — all CryoGEN-I reports; the whole purple curve, peak plus width, is what CryoWGEN reports. The missing wedge weakens the observation in this direction and flattens the likelihood, so the posterior widens: the same gap admits a family of plausible answers. Drag toward ample data and the posterior tightens onto the MAP.
The posterior: updating belief with Bayes’ rule
After observing , belief about is given by the posterior , obtained from Bayes’ rule:
Reading it term by term: the numerator is the prior belief multiplied by the likelihood that this explains the data. The denominator is the evidence, or marginal likelihood; it marginalizes over all (integrates out) and normalizes the posterior in into a valid probability distribution. Because does not depend on — it is just a constant scale factor — the relation is often written
that is, posterior likelihood prior. The symbol (proportional to) is a reminder that to recover actual probabilities you still divide by so the area integrates to one. That dropped constant is irrelevant for a point estimate, but when comparing two different models, itself measures how well a model fits.
The prior sets a starting point, the likelihood supplies the evidence carried by the data, and the posterior is a compromise between them. With ample data the likelihood dominates and the posterior concentrates; with scarce data the prior retains more influence.
A worked example: Beta-Binomial conjugacy
A canonical closed-form example is the Beta-Binomial conjugate pair: to estimate a success probability , a Beta prior combined with successes in Binomial trials, whose likelihood is , yields a Beta posterior, Beta. The posterior mean lies between the prior mean and the data frequency , and shifts toward as the sample size grows.
Concrete numbers make this tangible. With a uniform prior Beta (no knowledge of ) and successes in trials, the posterior is Beta, with mean — pulled by the data from the prior mean toward the frequency , but not all the way, because the sample is small. Swap in a strong prior Beta (a firm belief the coin is nearly fair), and the same data gives only Beta, mean — the prior drags the estimate back near . Read as pseudo-counts: the prior acts as if you had already seen successes and failures, and the real data simply add to those tallies. That is exactly how prior strength trades off against the amount of data.
When the prior and likelihood belong to matched families such that the posterior shares the prior’s family, the prior is called conjugate, and the posterior is available in closed form — updating just rewrites a few parameters. Conjugacy is one of the rare cases that sidesteps the integral; for most real models has no closed form and one resorts to MAP point estimates, variational inference, or Langevin sampling.
Prediction and uncertainty
Prediction for a new observation is given by the predictive distribution, which averages over the posterior:
This step is where the Bayesian approach parts ways with point estimation: instead of fixing a single and predicting from it, every possible votes, weighted by its posterior probability . The integrand is the likelihood of the new data under that . When the posterior is broad (the parameters are uncertain), the predictive distribution widens accordingly, automatically propagating parameter uncertainty into predictions — something a point estimate cannot do.
The posterior delivers a full uncertainty structure, not just a point. Two common summaries are worth distinguishing. A credible interval comes straight from the posterior: is a 95% credible interval exactly when , and it means literally “the posterior probability that lies in this interval is 0.95” — the very reading frequentist confidence intervals are so often mistaken for, but which here is the definition.
The evidence looks like a mere normalizing constant, yet it is the engine of model comparison. For two models , the ratio of evidences is the Bayes factor. It bakes in an Occam’s razor: an over-flexible model spreads its prior probability across many possible datasets, lowering for any particular , so a simpler model that still fits wins on evidence. Goodness-of-fit and model complexity end up unified in one quantity, with no separate penalty term.
For decisions, the framework supplies a clean optimality criterion: given a loss function , the best estimate minimizes the posterior expected loss . Squared loss yields the posterior mean, absolute loss the posterior median, and 0-1 loss the posterior mode (the MAP). Different point estimates correspond to different loss assumptions.
Where it sits in Cryo-ET reconstruction
In cryo-electron tomography, reconstruction can be framed as a Bayesian inverse problem: the unknown three-dimensional density plays the role of , the tilt-series projections are , the imaging model set by the CTF and noise gives the likelihood, and structural assumptions on the density act as the prior. The prior is not optional decoration here: projections span only a limited angular range (the missing wedge), so the likelihood carries almost no information along certain directions, and the posterior stays broad there — it is the prior that fills in where the data are silent. The demo above compresses this mechanism into one missing coordinate in 2-D.
Locating an optimum of the posterior and the role of regularization are treated in MAP, MLE & the EM algorithm. The four methods can be told apart by how they treat this posterior: CryoGEN-I takes the posterior mode (a MAP point estimate); CryoGEN-II returns a stable single answer via WAE/OT; and CryoWGEN-I and CryoWGEN-II refuse to collapse to one point, using EVIA (Monte-Carlo and Langevin respectively) to characterize a whole family of posterior samples — delivering how uncertain the density is along with the density itself.