Generative Adversarial Networks

01 Background

Generative Adversarial Networks are widely applied in vision tasks, such as Image Synthesis, Video In-painting, and Visual Style Transfer. A typical image-generation GAN contains a generator network (G) and a discriminator network (D). G is trained to generate fake images that can fool D, while D aims to distinguish the real images and synthetic images. For example, image synthesis can be shown below.

The following formula can be used to describe the process. It is also called the adversarial loss.

$min_{G} max_{D} E_{x \sim p_{data}} [lo g D (x)] + E_{z \sim p (z)} [lo g (D (G (z)))]$ where

$x \sim p_{data}$ are real training samples
$z \sim p (z)$ are the random noise samples.
$G (z)$ is the generated images using the neural network generator $G$
$D (\cdot)$ is the output of the discriminator, which specifies the probability of the input being real.

In order to understand this formula, note that the discriminator $D$ is simply a classifier that performs a binary classification. Recall the equation of a Binary cross-entropy loss function. $L (\overset{y}{^}, y) = TODO$ This implies that $D (x)$ should be 1 when as it is fed real training data from $p_{data}$ and the output of $D (G (z))$ as 0. Calculating the the loss using the binary cross-entropy loss function for both these cases we get the following values. $L(D(x), 1) = \log{D(x)}, \space L(D(G(Z), 0)) = \log{(1-D(G(Z)))} \tag{1}$ Remember the goal of the discriminator is to maximize the the loss functions, or become better at discriminating between real and fake samples. This gives us the inner portion of the formula above.

D max [lo g D (x) + lo g (D (G (z))]

If we want the max loss function for the discriminator over a batch of samples, we arrive with the expected $E$ notation.

D max [E_{x \sim p_{data}} [lo g D (x)] + E_{z \sim p (z)} [lo g (D (G (z))]] (2)

The generator, on the other hands, needs to trick the discriminator by generating images that are as real as possible. This means that generated images $G (z)$ should pass though the discriminator and be labeled as 1. In other words, it wants to minimize the chance of an image being fake (0). Looking at the formula, we want to minimize

L (D (G (Z)), 0) = lo g (1 - D (G (Z)))

Or for a batch.

G min [E_{z \sim p (z)} [lo g (D (G (z)))] (3)

Not that the generator is never fed real images, thus we can ignore the first term when putting (2) and (3) together to get (1)

G min D max E_{x \sim p_{data}} [lo g D (x)] + E_{z \sim p (z)} [lo g (D (G (z)))]

To summarize, this means that the discriminator parameters (Defined by D) will maximize the loss function and the generator parameters will minimize the loss function.