CryoWGEN-II: iterative Langevin sampling

Sample the Boltzmann posterior directly and iteratively with Langevin / SGLD — the same family, more faithful and tighter

CryoWGEN adds an entropy term to the optimal-transport cost, producing a Boltzmann posterior that yields a family of reconstructions per observation and captures the uncertainty inherent in missing-wedge restoration. That posterior admits two sampling routes: CryoWGEN-I estimates it by amortized Monte-Carlo, while CryoWGEN-II on this page samples it directly and iteratively with Langevin / SGLD. Both rest on the same entropic-variational foundation and differ only in how the posterior is sampled; CryoWGEN-II trades speed for precision, returning more faithful and tighter samples.

Intuition

CryoWGEN-I draws a batch of candidates from a reference distribution and reweights them by the Boltzmann factor. The reweighting is one-shot: you are stuck with whatever batch you drew, and if the candidates happen to land where the posterior has little mass, no weighting can recover what was never sampled. CryoWGEN-II inverts this — instead of scoring fixed candidates, it lets each candidate volume walk along the gradient of the posterior, step by step into the high-probability region, while injecting noise to avoid collapsing onto a single mode. In short, I is “guess a batch, then score it”; II is “iteratively refine toward the right answer, guided by the score.”

Target p(x)Sample histogram

Hundreds of walkers start uniform and, under Langevin dynamics, drift along the gradient of the log-density with added noise until they settle on the two modes. A larger step is faster but, taken too far, overshoots detail. Only the gradient is needed — no normalizing constant.

1. Why switch to Langevin

Independent Monte-Carlo sampling may be insufficiently precise: it draws once from a reference distribution and reweights, so the gap between the candidates and the true posterior must be closed entirely by the weights, samples in the posterior tails contribute almost nothing, and the effective sample size drops fast with dimension. CryoWGEN-II instead uses Langevin dynamics (SGLD) to target the Boltzmann posterior directly and iteratively: each candidate volume undergoes several gradient-guided refinement steps, being pulled toward the high-probability region, which yields higher-quality samples that track the posterior more closely, at the cost of more computation.

The drift term pulls each candidate toward the posterior’s high-probability region, while the injected Gaussian noise prevents the iterates from collapsing onto a single mode — exactly what distinguishes sampling (which must traverse the whole family) from maximization (which finds only one point).

Notation

The lines below use EVIA’s abstract symbols — data xx, latent zz, decoder A\mathcal{A}. In Cryo-ET these instantiate as: observation yy, clean volume xx, and the degradation operator TMR\mathcal{T}_M\circ R (a random rotation, then the missing wedge); zz and A\mathcal{A} live in EVIA’s abstract latent space, and w,λ,βw,\lambda,\beta are the witness potential, the data-fit weight, and the conditional-prior precision.

2. The effective potential and the SGLD update

To sample directly we need a scalar potential we can take gradients of. Taking the conditional prior to be an isotropic Gaussian κ(zx)=N(zˉ(x),β1I)\kappa(z\mid x)=\mathcal{N}(\bar z(x),\beta^{-1}I), the effective potential to minimize is a tractable Log-Sum-Exp:

Ψ(x;β,zˉ)=log ⁣exp ⁣{w(z)λ2xA(z)22β2zzˉ22}dz.\Psi(x;\beta,\bar z)=-\log\!\int\exp\!\Big\{w(z)-\tfrac{\lambda}{2}\|x-\mathcal{A}(z)\|_2^2-\tfrac{\beta}{2}\|z-\bar z\|_2^2\Big\}\,dz.

The exponent carries three terms, each doing one job: the witness potential w(z)w(z) scores how plausible the latent code is on its own; the data-fit term λ2xA(z)22-\tfrac{\lambda}{2}\|x-\mathcal{A}(z)\|_2^2 penalizes how far the decoding A(z)\mathcal{A}(z) strays from the observation xx, with a larger weight λ\lambda forcing the reconstruction to align with the data; and the conditional-prior term β2zzˉ22-\tfrac{\beta}{2}\|z-\bar z\|_2^2 tethers zz near the center zˉ(x)\bar z(x), with a larger precision β\beta making that spring stiffer. Integrating over zz and taking log-\log gives the Log-Sum-Exp — a soft combination of these competing terms, hence differentiable everywhere and amenable to gradient sampling.

Depth

Ψ\Psi contains an integral over zz with no closed form; but its gradients can be estimated with SGLD negative samples — each negative sample is itself a Langevin chain evolving by:

z    z+ηz ⁣[w(z)λ2xA(z)22β2zzˉ22]+2ηξ,ξN(0,I).z\;\leftarrow\;z+\eta\,\nabla_z\!\Big[w(z)-\tfrac{\lambda}{2}\|x-\mathcal{A}(z)\|_2^2-\tfrac{\beta}{2}\|z-\bar z\|_2^2\Big]+\sqrt{2\eta}\,\xi,\quad \xi\sim\mathcal{N}(0,I).

Read it as drift plus noise: the bracket is exactly the exponent above, and its gradient in zz is the drift pointing toward higher probability; the step size η\eta sets how far each step moves, and the final term 2ηξ\sqrt{2\eta}\,\xi injects Gaussian noise scaled to that step so the chain traverses the whole posterior rather than stopping at one mode. After several iterations the distribution of zz approaches the Boltzmann posterior, and those samples estimate the gradient of Ψ\Psi that drives the refinement of xx. This is the same mechanism as the noisy gradient ascent in the Langevin / SGLD section; only the target density differs, here given by the effective potential above.

3. CryoWGEN-I vs CryoWGEN-II

The same shared observation under all four methods. CryoGEN (amber) returns one volume each; CryoWGEN (purple) returns a whole family — drag the wedge slider and watch the family fan out, CryoWGEN-I sampled coarsely by Monte-Carlo, CryoWGEN-II finely by Langevin:

CryoGEN · one curve — a single answer (point estimate)
CryoGEN-IMAP — a single answer; the GAN-style energy carries bias, and it is overconfident
CryoGEN-IIglobal distribution matching (optimal transport) — a more stable single answer, but still GAN-family bias
CryoWGEN · a family of curves — answers with uncertainty (a distribution)
CryoWGEN-IMonte-Carlo — entropic-smoothed energy, a family closer to the truth (coarser)
CryoWGEN-IILangevin — the same smooth energy; the most faithful sampling, the tightest band
true structureCryoGEN (one)CryoWGEN (a family)

The true structure is two peaks (grey dashed). The missing wedge makes the gap between them ambiguous — and the four methods answer it differently. CryoGEN gives one definite answer, but it learns a GAN-style energy surface that carries bias: CryoGEN-I (MAP) deviates most and is overconfident; CryoGEN-II uses optimal transport to stabilize training and match the overall distribution, so it deviates less — but it is still a single deterministic answer. CryoWGEN switches to entropic regularization (EVIA), whose energy surface is smoother — its reconstructions sit closer to the truth, and instead of one answer it returns a family: CryoWGEN-I by Monte-Carlo (coarser, widely spread), CryoWGEN-II by Langevin / SGLD (the most faithful sampling, the tightest band). The width of that band is the missing-wedge uncertainty, made readable. Drag the slider — the more is missing, the more ambiguous the gap.

CryoWGEN-ICryoWGEN-II
SamplingMonte-Carlo (amortizable)iterative Langevin (SGLD)
Relation to the posteriorreweighted estimate of the Boltzmann posteriormulti-step gradient refinement, more faithful
Trade-offsimple, fast, lower precisionslower, higher precision

Both rest on the same entropic-variational foundation and differ only in how the Boltzmann posterior is sampled: I is amortized, fast, and coarse; II is iterative, slower, and fine.

4. What it achieves and its limit

By replacing “reweight a fixed batch of candidates” with “refine each candidate step by step into the posterior,” CryoWGEN-II obtains higher-fidelity, tighter samples: they target the Boltzmann posterior more faithfully, the family clusters more tightly in the high-probability region and stays more credible in the tails, so its account of uncertainty is sharper than Monte-Carlo’s. The cost is more computation — each sample is no longer a single draw-and-weight but a full multi-step Langevin chain.

For that reason the two are not a matter of one replacing the other: they share the same entropic-variational foundation and only branch at the sampling step. Use CryoWGEN-I when you need speed and amortization; use CryoWGEN-II when you need a closer match to the posterior and can afford the compute. For the full lineage from a single MAP point to an entire posterior family, see the methods overview.


For the sampling mechanism see Inference · Langevin dynamics and SGLD; for the sister sampling route see CryoWGEN-I; for the full lineage see the methods overview.

← Cryo-ET Reconstruction