CryoWGEN-I: Monte-Carlo sampling
Add an entropy term to the transport cost → a Boltzmann posterior; Monte-Carlo returns a family of reconstructions and makes the missing-wedge uncertainty readable
CryoGEN-II matches the aggregated distribution with optimal transport — the stable fix for CryoGEN-I’s GAN instability — but it returns one deterministic reconstruction per observation. CryoWGEN-I is the next step in the lineage: it is the Cryo-ET instantiation of the general tool EVIA (Entropic Variational Inference Auto-encoding), adding an entropy term to the transport cost so that a single answer becomes a family of answers. That family is what makes “which details are pinned by the data, and which are left free by the missing wedge” readable. This page covers the first realization — Monte-Carlo sampling; the more faithful iterative sampler is CryoWGEN-II.
Missing-wedge restoration is ill-posed: one corrupted observation should correspond to many clean volumes that all fit — the more tilt range is missing, the more volumes can fill that wedge without contradicting the measured data. Committing to one answer quietly picks a single member of this family and throws the rest away. CryoWGEN does not pick; it reports the distribution. The mechanism is a single change: add an entropy term to the optimal-transport cost, which forces the solution to stay spread out instead of collapsing to a point. The weight on that entropy is a temperature — the hotter it is, the wider the family.
1. From CryoGEN-II to entropic regularization
Solving CryoGEN-II’s optimal transport exactly is expensive in high dimensions, and its solution is hard: the transport plan sends each to essentially one . CryoWGEN adds an entropic regularizer to that plan, giving the entropic optimal-transport problem:
Term by term: is the joint transport plan between and , constrained to the couplings with marginals ; is the mismatch cost after pushing a candidate volume back to the observation domain through the degradation operator (which applies the missing wedge); is a reference coupling; and measures how far departs from that reference — this is the entropy term, and it penalizes any tendency of to collapse onto a point. The temperature is the exchange rate between the two.
This one entropy term brings three concrete benefits:
- (i) Strictly convex, unique solution. With the added, the objective is strictly convex in — no GAN-style multiplicity or instability as in CryoGEN-I, just a unique global optimum.
- (ii) The optimal coupling has a closed-form Gibbs (Boltzmann) density.
pairs with lower cost get exponentially higher probability, and sets how steep that exponential is. This is not something written down after the fact — it is the posterior sampled in the training E-step, so “solve the optimal transport” and “sample the posterior” are the same operation.
- (iii) No adversarial training. This closed-form posterior can be solved directly by Sinkhorn iterations or Langevin dynamics, sidestepping the generator/discriminator instability that drove CryoGEN-I.
As the entropy term vanishes and reduces to CryoGEN-II’s deterministic hard transport — so CryoWGEN is not a fresh start but a “heated” version of CryoGEN-II: turn the temperature up and the answer fans into a family, turn it down and it contracts back to that one point.
How the temperature controls the posterior width and the reconstruction uncertainty — drag the temperature to see:
wide posterior — a family → missing-wedge uncertainty (CryoWGEN)
Temperature γ sets the posterior's width directly. Write data-consistency as an energy E(x) (the amber well); the posterior is the Boltzmann distribution in that well, q(x|y) ∝ e^(−E(x)/γ) (purple). As γ→0 it collapses to a spike at the bottom — one deterministic reconstruction, exactly WAE / CryoGEN-II; as γ grows it spreads into a family of reconstructions, and that width is the missing-wedge uncertainty CryoWGEN reports. The purple ticks along the bottom are sample reconstructions drawn from the posterior; they fan out as γ rises.
If the encoder is trained to output the conditional mean of this Boltzmann posterior, its objective coincides with an Entropy-SGD update: Entropy-SGD’s local entropy smooths the loss landscape over a neighborhood before descending, so it lands not on a sharp minimum but on the soft barycenter of that neighborhood. The encoder thus returns a point estimate akin to MAP but smoothed by entropy; meanwhile the posterior as a whole still captures reconstruction uncertainty, and its aggregate stays close to the prior — that is, the mean gives you one stable answer while the full family gives you the uncertainty, with no contradiction between them. This equivalence between entropic OT and the Langevin-style E-step is derived in the paper’s appendix.
2. Realizing the posterior by Monte-Carlo sampling
With the closed-form posterior in hand, the only remaining question is how to sample from it. CryoWGEN-I takes the most direct route — Monte-Carlo reweighting:
- draw a batch of paired candidates from a reference distribution;
- weight each candidate by the Boltzmann factor — the better it matches the observation (the smaller ), the larger its weight;
- use these weighted samples to estimate the posterior itself and its conditional mean .
It is conceptually simple: no inner optimization, just one round of sampling plus an exponential reweighting. And the whole procedure can be amortized into an encoder — train the encoder to output the weighted mean directly, so that at inference time it need not redraw a batch for every incoming observation, removing the runtime sampling cost.
3. What it achieves, and its limit
What CryoWGEN-I actually delivers is the move from a single point to a distribution. For one observation it does not say “this is the answer” but returns a family of reconstructions all consistent with the measured data; along the wedge directions that were never measured, the family fans out, and the width of that fan makes the uncertainty explicit — a reader can see directly which structures are nailed down by the data and which are the model’s plausible fill-in inside the missing region. That is the gain of entropic regularization over CryoGEN-II’s single deterministic answer.
Its limit points straight to the next step. Monte-Carlo reweighting needs explicit access to the prior to draw the samples it reweights; and its independent sampling (and the approximation introduced by amortizing it) can be insufficiently precise — scattering points on a reference distribution and reweighting them leaves few effective samples in the high-probability region when the posterior is concentrated and the reference is not well placed, so the estimate gets coarse. Sampling the posterior more precisely calls for letting the samples be gradient-guided toward the high-probability region rather than passively scattered and culled — which is exactly why CryoWGEN-II switches to iterative Langevin (SGLD) sampling.
For the general algorithm see Generative Models · EVIA; the upstream deterministic reconstruction is CryoGEN-II, and the more faithful sampler is CryoWGEN-II.