CryoWGEN-I: Monte-Carlo sampling

Add an entropy term to the transport cost → a Boltzmann posterior; Monte-Carlo returns a family of reconstructions and makes the missing-wedge uncertainty readable

CryoGEN-II matches the aggregated distribution with optimal transport — the stable fix for CryoGEN-I’s GAN instability — but it returns one deterministic reconstruction per observation. CryoWGEN-I is the next step in the lineage: it is the Cryo-ET instantiation of the general tool EVIA (Entropic Variational Inference Auto-encoding), adding an entropy term to the transport cost so that a single answer becomes a family of answers. That family is what makes “which details are pinned by the data, and which are left free by the missing wedge” readable. This page covers the first realization — Monte-Carlo sampling; the more faithful iterative sampler is CryoWGEN-II.

Intuition

Missing-wedge restoration is ill-posed: one corrupted observation yy should correspond to many clean volumes xx that all fit — the more tilt range is missing, the more volumes can fill that wedge without contradicting the measured data. Committing to one answer quietly picks a single member of this family and throws the rest away. CryoWGEN does not pick; it reports the distribution. The mechanism is a single change: add an entropy term to the optimal-transport cost, which forces the solution to stay spread out instead of collapsing to a point. The weight γ\gamma on that entropy is a temperature — the hotter it is, the wider the family.

Observation yπ* ∝ e^{-c/γ}q*(x|y): candidate reconstructionsposterior mean
CryoGEN-II returns a single deterministic reconstruction; CryoWGEN returns the whole posterior — its mean is the reconstruction, its spread captures the uncertainty the missing wedge leaves behind.

1. From CryoGEN-II to entropic regularization

Solving CryoGEN-II’s optimal transport exactly is expensive in high dimensions, and its solution is hard: the transport plan π\pi sends each yy to essentially one xx. CryoWGEN adds an entropic regularizer to that plan, giving the entropic optimal-transport problem:

Wc,γ(py,qx)=infπΠ(py,qx){E(y,x)π[c(y,TM(x))]+γKL(πκ)}\mathcal{W}_{c,\gamma}(p_y,q_x)=\inf_{\pi\in\Pi(p_y,q_x)}\Big\{\mathbb{E}_{(y,x)\sim\pi}\big[c\big(y,\mathcal{T}_M(x)\big)\big]+\gamma\,\mathrm{KL}(\pi\,\|\,\kappa)\Big\}

Term by term: π\pi is the joint transport plan between yy and xx, constrained to the couplings Π(py,qx)\Pi(p_y,q_x) with marginals py,qxp_y,q_x; c(y,TM(x))c(y,\mathcal{T}_M(x)) is the mismatch cost after pushing a candidate volume xx back to the observation domain through the degradation operator TM\mathcal{T}_M (which applies the missing wedge); κ\kappa is a reference coupling; and KL(πκ)\mathrm{KL}(\pi\|\kappa) measures how far π\pi departs from that reference — this is the entropy term, and it penalizes any tendency of π\pi to collapse onto a point. The temperature γ>0\gamma>0 is the exchange rate between the two.

This one entropy term brings three concrete benefits:

π(y,x)    κ(y,x)exp ⁣(c(y,TM(x))γ),\pi^\star(y,x)\;\propto\;\kappa(y,x)\,\exp\!\Big(-\frac{c\big(y,\mathcal{T}_M(x)\big)}{\gamma}\Big),

pairs (y,x)(y,x) with lower cost cc get exponentially higher probability, and γ\gamma sets how steep that exponential is. This π\pi^\star is not something written down after the fact — it is the posterior sampled in the training E-step, so “solve the optimal transport” and “sample the posterior” are the same operation.

As γ0\gamma\to0 the entropy term vanishes and π\pi^\star reduces to CryoGEN-II’s deterministic hard transport — so CryoWGEN is not a fresh start but a “heated” version of CryoGEN-II: turn the temperature up and the answer fans into a family, turn it down and it contracts back to that one point.

How the temperature γ\gamma controls the posterior width and the reconstruction uncertainty — drag the temperature to see:

energy min = MAPsample reconstructions
energy E(x)posterior q(x|y) ∝ e^(−E/γ)

wide posterior — a family → missing-wedge uncertainty (CryoWGEN)

Temperature γ sets the posterior's width directly. Write data-consistency as an energy E(x) (the amber well); the posterior is the Boltzmann distribution in that well, q(x|y) ∝ e^(−E(x)/γ) (purple). As γ→0 it collapses to a spike at the bottom — one deterministic reconstruction, exactly WAE / CryoGEN-II; as γ grows it spreads into a family of reconstructions, and that width is the missing-wedge uncertainty CryoWGEN reports. The purple ticks along the bottom are sample reconstructions drawn from the posterior; they fan out as γ rises.

Depth

If the encoder is trained to output the conditional mean E[q(xy)]\mathbb{E}[q(x\mid y)] of this Boltzmann posterior, its objective coincides with an Entropy-SGD update: Entropy-SGD’s local entropy smooths the loss landscape over a neighborhood before descending, so it lands not on a sharp minimum but on the soft barycenter of that neighborhood. The encoder thus returns a point estimate akin to MAP but smoothed by entropy; meanwhile the posterior q(xy)q(x\mid y) as a whole still captures reconstruction uncertainty, and its aggregate q(xy)p(y)dy\int q(x\mid y)\,p(y)\,dy stays close to the prior p(x)p(x) — that is, the mean gives you one stable answer while the full family gives you the uncertainty, with no contradiction between them. This equivalence between entropic OT and the Langevin-style E-step is derived in the paper’s appendix.

2. Realizing the posterior by Monte-Carlo sampling

With the closed-form posterior πκexp(c/γ)\pi^\star\propto\kappa\,\exp(-c/\gamma) in hand, the only remaining question is how to sample from it. CryoWGEN-I takes the most direct route — Monte-Carlo reweighting:

  1. draw a batch of paired candidates (y,x)(y,x) from a reference distribution;
  2. weight each candidate by the Boltzmann factor exp ⁣(c(y,TM(x))/γ)\exp\!\big(-c(y,\mathcal{T}_M(x))/\gamma\big) — the better it matches the observation (the smaller cc), the larger its weight;
  3. use these weighted samples to estimate the posterior itself and its conditional mean E[q(xy)]\mathbb{E}[q(x\mid y)].

It is conceptually simple: no inner optimization, just one round of sampling plus an exponential reweighting. And the whole procedure can be amortized into an encoder — train the encoder to output the weighted mean directly, so that at inference time it need not redraw a batch for every incoming observation, removing the runtime sampling cost.

3. What it achieves, and its limit

What CryoWGEN-I actually delivers is the move from a single point to a distribution. For one observation it does not say “this is the answer” but returns a family of reconstructions all consistent with the measured data; along the wedge directions that were never measured, the family fans out, and the width of that fan makes the uncertainty explicit — a reader can see directly which structures are nailed down by the data and which are the model’s plausible fill-in inside the missing region. That is the gain of entropic regularization over CryoGEN-II’s single deterministic answer.

Its limit points straight to the next step. Monte-Carlo reweighting needs explicit access to the prior pxp_x to draw the samples it reweights; and its independent sampling (and the approximation introduced by amortizing it) can be insufficiently precise — scattering points on a reference distribution and reweighting them leaves few effective samples in the high-probability region when the posterior is concentrated and the reference is not well placed, so the estimate gets coarse. Sampling the posterior more precisely calls for letting the samples be gradient-guided toward the high-probability region rather than passively scattered and culled — which is exactly why CryoWGEN-II switches to iterative Langevin (SGLD) sampling.


For the general algorithm see Generative Models · EVIA; the upstream deterministic reconstruction is CryoGEN-II, and the more faithful sampler is CryoWGEN-II.

← Cryo-ET Reconstruction