Key insight: Score Distillation is DDIM with a change of variable.
TL;DR
Skip to Background → Main idea → Fix → Results → Citing
Score Distillation Sampling (SDS) is a promising technique that allows to use pre-trained 2D diffusion models for 3D generation. However, the quality of the generated 3D assets is limited.
In this work we:
- 🔎 Theoretically show that SDS ≈ 2D Diffusion
- 🚨 Reveal that the noise term in SDS is the reason for over-smoothing
- 🛠️ Suggest a fix
- ✅ Improve the quality of 3D generation

“A photograph of a ninja”
How does Score Distillation work?
Proposed in DreamFusion and Score Jacobian Chaining, Score Distillation is a method for generating 3D shapes using a pre-trained and frozen 2D diffusion model.
- Initialize a differentiable 3D representation
- Sample a random camera pose 📷
- Render a view of the object 🖼️
- Add noise to the rendering 🌫️
- Denoise the image with the 2D diffusion model
- Optimize the parameters of the 3D representation to match the denoised image 🎯
- Repeat steps 2-6 until convergence 🔁
Often the generated shapes are over-smoothed and over-saturated.
What do we propose?
In this work we start with the first principles of image generation with diffusion models. At first, we consider the steps of DDIM that gradually remove noise from the image. Each update can be seen as denoising the image all the way with a single-step update and then adding portion of this noise back to the image. By reshuffling the order of the updates, we can define a dual process defined on the space of noise-free images.
For a formal derivation, see the full paper .
Good noise is all you need
Depending on the choice of the noise term the dual process becomes either identical to Score Distillation or to DDIM.
The dual process is equivalend to SDS when noise is sampled randomly:
Thus each SDS udpate step corresponds to a different DDIM trajectory. This inconsistency averages the final result across multiple trajectories and leads to blurriness.
From the reparametrization intuition we know that noise term should follow specific structure:
In practise it is hard to solve as it involves an inverse of the trained diffusion model. We suggest to use DDIM inversion to find an approximate solution.
We obtain the noise term with DDIM inversion conditioned on the current rendering:
This anchors score distillation process to DDIM trajectories that are consistent with each other both in time and across the different rendering angles.
Results

“A DSLR photograph of a freshly baked round loaf of sourdough bread”

“Robotic bee high detail”

“Pumpkin head zombie, skinny, highly detailed”

“A ripe strawberry”

“An ice cream sundae”

“A photograph of a knight”
See the paper for detailed comparisons with ProlificDreamer, Noise-Free SD, HiFA, Lucid Dreamer, and other amazing works in Score Distillation.