DDPM

Symbols

data
latent variables
variance of diffusion process

Key Concepts

Gaussian Distribution

1. Probability Density Function (PDF)

For a continuous random variable with mean and variance , the formula is:

2. Multivariate Gaussian Distribution

Used when dealing with multiple correlated variables (common in 3D vision and diffusion models):

  • : Number of dimensions.

  • : -dimensional vector.

  • : Mean vector.

  • : Covariance matrix (defines the shape and orientation in multi-dimensional space).

Addition of Gaussian Random Variables

Consider two independent Gaussian Random Variables:


  • their sum still follows Gaussian Distribution:
    Mean:
    Variance:

Proof Of Variance

KL divergence

Kullback-Leibler Divergence:

If and are both Gaussian Distributions:

then:

Furtherly, in DDPM, , then:

which is the same as MSE.

Reparameterization Trick

在生成模型(如 VAE, DDPM)中,我们需要从某个分布 中采样出一个变量 。但采样这个动作是 不可导 的,因为它具有随机性,这会导致反向传播在采样这一层中断。
重参数化的核心思想是:将随机变量 分解为一个 确定的变换 和 一个 独立的标准噪声

假设我们有一个均值为 ,方差为 的高斯分布
我们可以改写为:

这里发生了两个关键转变:

  1. 确定性路径 现在通过加法和乘法与 连接。由于加法和乘法是可导的,梯度可以直接回传给网络生成的

  2. 噪声外置:所有的随机性都被转移到了 身上。而在求导时,我们把 看作是一个常数系数。

In DDPM:

其中
递推可得 (其中 , )

Variational Inference (VI)

1. Motivation: The Intractability Problem

In Bayesian inference, we aim to compute the Posterior Distribution:

  • The Issue: For complex models, the evidence (the integral) is intractable to compute because the latent space is high-dimensional.

  • The Goal of VI: Instead of computing exactly, we approximate it with a simpler distribution from a tractable family .


2. Core Strategy: Inference as Optimization

VI transforms the integration problem into a functional optimization problem:

Find that is most similar to .

The “similarity” is measured by Kullback-Leibler (KL) Divergence:


3. The Evidence Lower Bound (ELBO)

Since we cannot compute directly, we cannot minimize the KL divergence directly. We use the Evidence Decomposition:

Where the ELBO is defined as:

  • Key Logic: Since is fixed and , maximizing ELBO is equivalent to minimizing KL divergence.

4. Two Practical Interpretations of ELBO

To implement VI in neural networks (like VAE or DDPM), we rewrite ELBO:

Component Function
Reconstruction Maximizes the likelihood that latent can recover data .
Regularizer Forces the approximate posterior to stay close to the prior .

Pipeline

image.png
image.png

Inverse Process

image.png

Forward Process

image.png

note that at arbitrary timestep can be represented by and by Markov chain properties as following:

由贝叶斯公式,可得前向过程的单步逆向应该是:
image.png

在DDPM中,前向加噪过程的方差固定,因此前向加噪过程无可训练对象

Training

Loss

image.png

The first inequality follows from Jensen’s Inequality.

(英文写不动了,还是用中文把)
对log项中的分母使用贝叶斯公式,为了进行diffusion和denoising过程在第t时间步的对比,将概率“逆向”,化简,并表示成KL散度的形式:

实际训练时,根据(4)式,第t时间步可由 表出,因此使用加上 背景,使用以下公式:

实验表明

image.png

多角度理解

Diffusion Probabilistic Models and Denoising Score Matching with Langevin dynamics

有损/无损压缩

Rate-distortion Behavior

思考/启发

1. FID 与 NLL 的权衡

  • (完整变分下界):在数学严谨的变分上限损失函数中,模型会给不同的噪声水平(时间步 )分配不同的权重。为了获得更好的似然值(NLL),它会迫使模型去精确拟合那些高噪声、小尺度的细节,这虽然对“无损码长”有利,但对视觉感知的贡献较小。

  • :该方案去掉了复杂的加权系数,本质上是在所有时间步 上进行重加权(Reweighting)。这种简化使得模型能够更加关注那些对图像结构和视觉质量贡献最大的特征,而不是一味追求数学上的概率对齐。

最符合数学定义的损失函数不一定产生人类认为最好的视觉效果

2. Diffusion 模型是好的有损压缩

本文作者:Jnyau Zhneg

本文链接:/posts/d0a8/

版权声明:本文采用 CC BY-NC-SA 4.0 许可协议

评论