BSRGAN

class face_crop_plus.models.rrdb.RRDBNet(min_face_factor=0.001)[source]

Bases: Module, LoadMixin

Face quality enhancer.

This model is capable of detecting which images have low-quality faces, i.e., which images have small face areas compared to the dimensions of the image and is able to enhance the quality of those images. The images are up-scaled 4 times and then resized to their original size - this results in less blurry faces.

This class also inherits load method from LoadMixin class. The method takes a device on which to load the model and loads the model with a default state dictionary loaded from WEIGHTS_FILENAME file. It sets this model to eval mode and disables gradients.

For more information on how RetinaFace model works, see this repo: BSRGAN. Most of the code was taken from that repository.

Note

Whenever an input shape is mentioned, N corresponds to batch size, C corresponds to the number of channels, H - to input height, and W - to input width.

WEIGHTS_FILENAME = 'bsrgan_x4_enhancer.pth'

The constant specifying the name of .pth file from which the weights for this model should be loaded. Defaults to “bsrgan_x4_enhancer.pth”.

Type:

WEIGHTS_FILENAME (str)

__init__(min_face_factor=0.001)[source]

Initializes RRDB (BSRGAN) model.

Just assigns the minimum face threshold attribute and constructs module layers for quality inference.

Parameters:

min_face_factor (float) – The minimum average face factor, i.e., face area relative to the image, below which the whole image is enhanced. Defaults to 0.001.

forward(x)[source]

Performs forward pass.

Takes an input tensor which is a batch of images and produces the same batch, except images are up-scaled 4 times.

Parameters:

x (Tensor) – The input tensor of shape (N, 3, H, W).

Return type:

Tensor

Returns:

An output tensor of shape (N, 3, 4*H, 4*W).

predict(images, landmarks, indices)[source]

Enhances the quality of images with low-quality faces.

Takes a batch of images and sets of landmarks for each image and enhances the quality of those images for which the average face area factor is lower than self.min_face_factor. The face factor is computed by dividing the face area (computed by multiplying the width and the height of the face, specified by left-eye, right-eye, left-mouth, right-mouth landmark coordinates) by the image area.

Note

The images are enhanced one by one instead of as a batch because the inference is very memory consuming and can result in memory errors.

Parameters:
  • images (Tensor | list[Tensor]) – Image batch of shape (N, 3, H, W) in RGB form with float values from 0.0 to 255.0. It must be on the same device as this model. It can also be a list of tensors of different shapes.

  • landmarks (Optional[ndarray]) – Landmarks batch of shape (num_faces, 5, 2) used to compute average face area for each image. If None, then every image will be enhanced.

  • indices (Optional[list[int]]) – Indices list mapping each set of landmarks to a specific image in images batch (multiple sets of landmarks can come from the same image). If None, then every image will be enhanced.

Return type:

Tensor

Returns:

The same image batch as images - the shape is (N, 3, H, W) channels are in RGB and values range from 0.0 to 255.0. The only difference is that some of the images are of much higher quality, i.e., less blurry.