SRGAN

The highly challenging task of estimating a highresolution (HR) image from its low-resolution (LR) counterpart is referred to as super-resolution (SR). This estimated image is called as super resolved (SR) image.
A HR image is generally considered a upscaled image of LR without losing any features or even better clarifing some features.

Here the left image is a SR image which is almost indistinguishable from the original HR image.

Table of Content:

1. Why SRGAN instead of other old methods ?

Before the SRGAN paper other methods such as interpolation, convulation network, etc. where used to generate a HR image out of LR image.
All these methods would calculate average values between the pixels to upscale the image and try to not loose the content of the image. This method of generating the HR image has produced decent result on the basis of our loss functions used previously such as MSE.
But as soon as the image is produced infront of a human they can clearly identify the mistakes in the generated image, thus this reliance on MSE as error method couldn't be accepted.

The SRGAN paper thus suggested to use a GAN network instead of a Convulational network model only thus dividing the loss into content loss and adverserial loss.

2. Network Architecture:

Like all GAN networks the architecture is divided into two parts:

Generator: Generator is trained to generate SR images from LR images.

Here the Generator can be seen into 3 types of blocks:
-> Input and Output blocks: The input block consist of Convulation layer followed by a Prametric Relu and output layer is a Convulational layer.
-> Redsidual blocks: There are a total of 16 residual block. Each block containing a Convulation layer, Batch Normalization (BN) layer, PReLU layer followed by a Convulation layer and BN layer.
-> Upscaling block (Pixel Shuffler): There are 2 upscaling blocks each upscaling image by 2 times hence resulting final upscaled image to be 4 times the resolution.

Discriminator: Discriminator is trained to classify whether the image is SR image or HR image.

Here the Generator can be seen into 3 types of blocks:
-> Input block: The input block consist of Convulation layer followed by a Leaky Relu.
-> Internal blocks: There are a total of 7 internal blocks. Each block containing a Convulation layer, BN layer and Leaky ReLU layer.
-> Output block:It consist of a 1024 neuron dense layer followed by a Leaky Relu and finally passing the output to a single neuron dense layer with sigmoid as activation function.

3. Loss Functions:

SRGAN use Preceptual loss which is a combination of content loss and adverserial loss.
Perceptual loss equation

Content loss: For the content loss the most common loss methos used is MSE. However getting the best result according to MSE results in very smooth textures.
To overcome this problem, the VGG loss was introduced. With φi,j we indicate the feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network, which we consider given. We then define the VGG loss as the euclidean distance between the feature representations of a reconstructed image GθG (ILR) and the reference image IHR:

In conclusion instead of directly using MSE of SR and HR image we calculate the MSE of features maps produced by VGG19 jth convolution layer.

Adverserial loss: The generative loss is defined based on the probabilities of the discriminator over all training samples (N):

The weightage of adverserial loss is 1/1000.

4. Images For Training:

The images for training were generated by download random image sets from internet and converting the images to HR(256 x 256) and LR (64 x 64) images of required sizes.
This was done by my short code I wrote below:

A total of around 1200 images were used out of which I did an 80/20 split for trainig and testing.

5. Training and Testing:

The model gnerated was only trained for about 30 epochs on limited data due to the unavailability of enough compute.
Even though of limited training the model showed enough improvement to demonstrate the possiblity of what can be achieved if enough compute is available.

Lets see the example of this image of a bike:

In the Gen3(Generator after 30 epochs) image we can clearly see the improvement over the edges of the bike than the LR image.

Here we can compare the models after 5 (Gen1), 15 (Gen2), 30(Gen3) epochs of training:

The Gen2 image of the car is way smoother than the Gen1 or the LR image, but there isn't any huge difference in Gen3. After epochs 15-20 there were only slight improvements over colors and textures.

6. Result Analysis:

From the two examples the main error of the model is seen to be color grading, bright white spots are green in generated images. The green spot increases in Gen2 but again decrease slightly on Gen3 which gives indication of not enough trainng.
This color grading problem may have arrised due to training of a saved model multiples times due to limited compute and the less data and train time given.

From this car test we can see the color improvement in Gen2 over Gen1. Thus given enough training it will fix the color grading.

7. Conclusion:

Even though we couldn't get mind blowing results out of the model, we got enough improvement to see the possiblities.
This implementaion of a established paper was a fun learning exeperience. I would recommend everyone to choose a paper and try to implement it in any capacity possible.

Single Image Super Resolution using a GAN

Super Resolution ?