Wasserstein Distance
Aka the Kantorovich–Rubinstein Metric or the Earth Mover’s Distance or “optimal transport”.
The Wasserstein Distance is a way of measuring the distance between two distributions. Rather than looking at the y difference between distributions at each location, Wasserstein Distance looks at the optimal way to move from one distribution to another. One can think of each distribution as a pile of dirt and the 1 Wasserstein Distance is the optimal way to transport the dirt in one pile so that their distribution match.
The way we calculate the cost of moving dirt is similar to how work is calculated in physics: distance times amount.
For continuous distributions, calculus kicks in as we now have infinitely small piles. For distributions $u$ and $v$, the 1 Wasserstein distance is:
$$d_{w_1}(u, v)=\inf {\pi} \int{\mathbb{R} \times \mathbb{R}}d(u-v) , d \pi(u, v)$$
- $\displaystyle\inf _{\pi \in \Gamma(u, v)}$ is used to take the largest value that is below or equal to the set of possible moving plans
- $\displaystyle\int_{\mathbb{R} \times \mathbb{R}}$ tries to sum all the work needed to transport dirt
- $\pi(u, v)$ is the optimal joint distribution of $u$ and $v$. The correlation between the two distributions indicate how dirt piles should be transported.
- $d(u-v)$ is the distance between the original and the new location of dirt piles
- $d \pi(u, v)$ is how heavy the infinitesimally small pile of dirt is to move
This is useful in Wasserstein GAN