Symbols
data
latent variables
variance of diffusion process
Key Concepts
Gaussian Distribution
1. Probability Density Function (PDF)
For a continuous random variable
2. Multivariate Gaussian Distribution
Used when dealing with multiple correlated variables (common in 3D vision and diffusion models):
: Number of dimensions. : -dimensional vector. : Mean vector. : Covariance matrix (defines the shape and orientation in multi-dimensional space).
Addition of Gaussian Random Variables
Consider two independent Gaussian Random Variables:
their sumstill follows Gaussian Distribution:
Mean:
Variance:
Proof Of Variance
KL divergence
Kullback-Leibler Divergence:
If
then:
Furtherly, in DDPM,
which is the same as MSE.
Reparameterization Trick
在生成模型(如 VAE, DDPM)中,我们需要从某个分布
重参数化的核心思想是:将随机变量
假设我们有一个均值为
我们可以改写为:
这里发生了两个关键转变:
确定性路径:
和 现在通过加法和乘法与 连接。由于加法和乘法是可导的,梯度可以直接回传给网络生成的 和 。 噪声外置:所有的随机性都被转移到了
身上。而在求导时,我们把 看作是一个常数系数。
In DDPM:
其中
递推可得
Variational Inference (VI)
1. Motivation: The Intractability Problem
In Bayesian inference, we aim to compute the Posterior Distribution:
The Issue: For complex models, the evidence
(the integral) is intractable to compute because the latent space is high-dimensional. The Goal of VI: Instead of computing
exactly, we approximate it with a simpler distribution from a tractable family .
2. Core Strategy: Inference as Optimization
VI transforms the integration problem into a functional optimization problem:
Find
that is most similar to .
The “similarity” is measured by Kullback-Leibler (KL) Divergence:
3. The Evidence Lower Bound (ELBO)
Since we cannot compute
Where the ELBO is defined as:
- Key Logic: Since
is fixed and , maximizing ELBO is equivalent to minimizing KL divergence.
4. Two Practical Interpretations of ELBO
To implement VI in neural networks (like VAE or DDPM), we rewrite ELBO:
| Component | Function |
|---|---|
| Reconstruction | Maximizes the likelihood that latent |
| Regularizer | Forces the approximate posterior |
Pipeline


Inverse Process

Forward Process

note that
由贝叶斯公式,可得前向过程的单步逆向应该是:
在DDPM中,前向加噪过程的方差固定,因此前向加噪过程无可训练对象
Training
Loss

The first inequality follows from Jensen’s Inequality.
(英文写不动了,还是用中文把)
对log项中的分母使用贝叶斯公式,为了进行diffusion和denoising过程在第t时间步的对比,将概率“逆向”,化简,并表示成KL散度的形式:
实际训练时,根据(4)式,第t时间步可由
实验表明

多角度理解
Diffusion Probabilistic Models and Denoising Score Matching with Langevin dynamics
有损/无损压缩
Rate-distortion Behavior
思考/启发
1. FID 与 NLL 的权衡
(完整变分下界):在数学严谨的变分上限损失函数中,模型会给不同的噪声水平(时间步 )分配不同的权重。为了获得更好的似然值(NLL),它会迫使模型去精确拟合那些高噪声、小尺度的细节,这虽然对“无损码长”有利,但对视觉感知的贡献较小。 :该方案去掉了复杂的加权系数,本质上是在所有时间步 上进行重加权(Reweighting)。这种简化使得模型能够更加关注那些对图像结构和视觉质量贡献最大的特征,而不是一味追求数学上的概率对齐。
最符合数学定义的损失函数不一定产生人类认为最好的视觉效果
评论