Cropper

class face_crop_plus.cropper.Cropper(output_size=256, output_format=None, resize_size=1024, face_factor=0.65, strategy='largest', padding='constant', allow_skew=False, landmarks=None, attr_groups=None, mask_groups=None, det_threshold=0.6, enh_threshold=None, batch_size=8, num_processes=1, device='cpu')[source]

Bases: object

Face cropper class with bonus features.

This class is capable of automatically aligning and center-cropping faces, enhancing image quality and grouping the extracted faces according to specified face attributes, as well as generating masks for those attributes.

Capabilities

This class has the following 3 main features:

Face cropping - automatic face alignment and cropping based on landmarks. The landmarks can either be predicted via face detection model (see RetinaFace) or they can be provided as txt, csv, json etc. file. It is possible to control face factor in the extracted images and strategy of extraction (e.g., largest face, all faces per image).

Face enhancement - automatic quality enhancement of images where the relative face area is small. For instance, there may be images with many faces, but the quality of those faces, if zoomed in, is low. Quality enhancement feature allows to remove the blurriness. It can also enhance the quality of every image, if desired (see RRDBNet).

Face parsing - automatic face attribute parsing and grouping to sub-directories according selected attributes. Attributes can indicate to group faces that contain specific properties, e.g., “earrings and necklace”, “glasses”. They can also indicate what properties the faces should not include to form a group, e.g., “no accessories” group would indicate to include faces without hats, glasses, earrings, necklace etc. It is also possible to generate masks for selected face attributes, e.g., “glasses”, “eyes and eyebrows”. For more intuition on how grouping works, see BiSeNet and save_groups().

The class is designed to perform all or some combination of the functions in one go, however, each feature is independent of one another and can work one by one. For example, it is possible to first extract all the faces in some output directory, then apply quality enhancement for every face to produce better quality faces in another output directory and then apply face parsing to group faces into different sub-folders according to some common attributes in a final output directory.

It is possible to configure the number of processing units and the batch size for significant speedups., if the hardware allows.

Examples

Command line example

>>> python face_crop_plus -i path/to/images -o path/to/out/dir

Auto face cropping (with face factor) and quality enhancement:

>>> cropper = Cropper(face_factor=0.7, enh_threshold=0.01)
>>> cropper.process_dir(input_dir="path/to/images")

Very fast cropping with already known landmarks (no enhancement):

>>> cropper = Cropper(landmarks="path/to/landmarks.txt",
                      num_processes=24,
                      enh_threshold=None)
>>> cropper.process_dir(input_dir="path/to/images")

Face cropping to attribute groups to custom output dir:

>>> attr_groups = {"glasses": [6], "no_glasses_hats": [-6, -18]}
>>> cropper = Cropper(attr_groups=attr_groups)
>>> inp, out = "path/to/images", "path/to/parent/out/dir"
>>> cropper.process_dir(input_dir=inp, output_dir=out)

Face cropping and grouping by face attributes (+ generating masks):

>>> groups = {"glasses": [6], "eyes_and_eyebrows": [2, 3, 4, 5]}
>>> cropper = Cropper(output_format="png", mask_groups=groups)
>>> cropper.process_dir("path/to/images")

For grouping by face attributes, see documented face attribute indices in BiSeNet.

Class Attributes

For how to initialize the class and to understand its functionality better, please refer to class attributes initialized via __init__(). Here, further class attributes are described automatically initialized via _init_models() and _init_landmarks_target().

det_model

Face detection model (torch.nn.Module) that is capable of detecting faces and predicting landmarks used for face alignment. See RetinaFace.

Type:: RetinaFace

enh_model

Image quality enhancement model (torch.nn.Module) that is capable of enhancing the quality of images with faces. It can automatically detect which faces to enhance based on average face area in the image, compared to the whole image area. See RRDBNet.

Type:: RRDBNet

par_model

Face parsing model (torch.nn.Module) that is capable of classifying pixels according to specific face attributes, e.g., “left_eye”, “earring”. It is able to group faces to different groups and generate attribute masks. See BiSeNet.

Type:: BiSeNet

landmarks_target

Standard normalized landmarks of shape (self.num_std_landmarks, 2). These are scaled by self.face_factor and used as ideal landmark coordinates for the extracted faces. In other words, they are reference landmarks used to estimate the transformation of an image based on some actual set of face landmarks for that image.

Type:: numpy.ndarray

__init__(output_size=256, output_format=None, resize_size=1024, face_factor=0.65, strategy='largest', padding='constant', allow_skew=False, landmarks=None, attr_groups=None, mask_groups=None, det_threshold=0.6, enh_threshold=None, batch_size=8, num_processes=1, device='cpu')[source]

Initializes the cropper.

Initializes class attributes.

Parameters:

output_size (int | tuple[int, int] | list[int]) – The output size (width, height) of cropped image faces. If provided as a single number, the same value is used for both width and height. Defaults to 256.
output_format (Optional[str]) – The output format of the saved face images. For available options, see OpenCV imread. If not specified, then the same image extension will not be changed, i.e., face images will be of the same format as the images from which they are extracted. Defaults to None.
resize_size (int | tuple[int, int] | list[int]) – The interim size (width, height) each image should be resized to before processing images. This is used to resize images to a common size to allow to make a batch. It should ideally be the mean width and height of all the images to be processed (but can simply be a square). Images will be resized to to the specified size while maintaining the aspect ratio (one of the dimensions will always match either the specified width or height). The shorter dimension would afterwards be padded - for more information on how it works, see utils.create_batch_from_files(). Defaults to 1024.
face_factor (float) – The fraction of the face area relative to the output image. Defaults to 0.65.
strategy (str) –
The strategy to use to extract faces from each image. The available options are:
- ”all” - all faces will be extracted form each image.
- ”best” - one face with the largest confidence score will be extracted from each image.
- ”largest” - one face with the largest face area will be extracted from each image.
For more info, see RetinaFace.__init__(). Defaults to “largest”.
padding (str) – The padding type (border mode) to apply when cropping out faces. If faces are near edge, some part of the resulting center-cropped face image may be blank, in which case it can be padded with specific values. For available options, see OpenCV BorderTypes. If specified as “constant”, the value of 0 will be used. Defaults to “reflect”.
allow_skew (bool) – Whether to allow skewing when aligning the face according to its landmarks. If True, then facial points will be matched very closely to the ideal standard landmark points (which is a set of reference points created internally when preforming the transformation). If all faces face forward, i.e., in portrait-like manner, then this could be set to True which results in minimal perspective changes. However, most of the time this should be set to False to preserve the face perspective. For more details, see crop_align(). Defaults to False.
landmarks (Union[str, tuple[ndarray, ndarray], None]) –
If landmarks are already known, they should be specified via this variable. If specified, landmark estimation will not be performed. There are 2 ways to specify landmarks:
1. As a path to landmarks file, in which case str should be provided. The specified file should contain file (image) names and corresponding landmark coordinates. Duplicate file names are allowed (in case multiple faces are present in the same image). For instance, it could be .txt file where each row contains space-separated values: the first value is the file name and the other 136 values represent landmark coordinates in x1, y1, x2, y2, … format. For more details about the possible file formats and how they are parsed, see parse_landmarks_file().
2. As a tuple of 2 numpy arrays. The first one is of shape (num_faces, num_landm, 2) of type numpy.float32 and represents the landmarks of every face that is going to be extracted from images. The second is a numpy array of shape (num_faces,) of type numpy.str_ where each value specifies a file name to which a corresponding set of landmarks belongs.
If not specified, 5 landmark coordinates will be estimated for each face automatically. Defaults to None.
attr_groups (Optional[dict[str, list[int]]]) – Attribute groups dictionary that specifies how to group the output face images according to some common attributes. The keys are names describing some common attribute, e.g., “glasses”, “no_accessories” and the values specify which attribute indices belong (or don’t belong, if negative) to that group, e.g., [6], [-6, -9, -15]. For more information, see BiSeNet and save_groups(). If not provided, output images will not be grouped by attributes and no attribute sub-folders will be created in the desired output directory. Defaults to None.
mask_groups (Optional[dict[str, list[int]]]) – Mask groups dictionary that specifies how to group the output face images according to some face attributes that make up a segmentation mask. The keys are mask type names, e.g., “eyes”, and the values specify which attribute indices should be considered for that mask, e.g., [4, 5]. For every group, not only face images will be saved in a corresponding sub-directory, but also black and white face attribute masks (white pixels indicating the presence of a mask attribute). For more details, see For more info, see BiSeNet and save_groups(). If not provided, no grouping is applied. Defaults to None.
det_threshold (Optional[float]) – The visual threshold, i.e., minimum confidence score, for a detected face to be considered an actual face. See RetinaFace.__init__() for more details. If None, no face detection will be performed. Defaults to 0.6.
enh_threshold (Optional[float]) – Quality enhancement threshold that tells when the image quality should be enhanced (it is an expensive operation). It is the minimum average face factor, i.e., face area relative to the image, below which the whole image is enhanced. It is advised to set this to a low number, like 0.001 - very high fractions might unnecessarily cause the image quality to be improved. Defaults to None.
batch_size (int) – The batch size. It is the maximum number of images that can be processed by every processor at a single time-step. Large values may result in memory errors, especially, when GPU acceleration is used. Increase this if less models (i.e., landmark detection, quality enhancement, face parsing models) are used and decrease otherwise. Defaults to 8.
num_processes (int) – The number of processes to launch to perform image processing. Each process works in parallel on multiple threads, significantly increasing the performance speed. Increase if less prediction models are used and increase otherwise. Defaults to 1.
device (str | device) – The device on which to perform the predictions, i.e., landmark detection, quality enhancement and face parsing. If landmarks are provided, no enhancement and no parsing is desired, then this has no effect. Defaults to “cpu”.

_init_landmarks_target()[source]

Initializes target landmarks set.

This method initializes a set of standard landmarks. Standard, or target, landmarks refer to an average set of landmarks with ideal normalized coordinates for each facial point. The source facial points will be rotated, scaled and translated to match the standard landmarks as close as possible.

Both source (computed separately for each image) and target landmarks must semantically match, e.g., the left eye coordinate in target landmarks also corresponds to the left eye coordinate in source landmarks.

There should be a standard landmarks set defined for a desired number of landmarks. Each coordinate in that set is normalized, i.e., x and y values are between 0 and 1. These values are then scaled based on face factor and resized to match the desired output size as defined by self.output_size.

Note

Currently, only 5 standard landmarks are supported.

Raises:: ValueError – If the number of standard landmarks is not supported. The number of standard landmarks is self.num_std_landmarks.

_init_models()[source]

Initializes detection, enhancement and parsing models.

The method initializes 3 models:

If self.det_threshold is provided and no landmarks are known in advance, the detection model is initialized to estimate 5-point landmark coordinates. For more info, see RetinaFace.
If self.enh_threshold is provided, the quality enhancement model is initialized. For more info, see RRDBNet.
If self.attr_groups or self.mask_groups is provided, then face parsing model is initialized. For more info, see BiSeNet.

Note

This is a useful initializer function if multiprocessing is used, in which case copies of all the models can be created on separate cores.

crop_align(images, padding, indices, landmarks_source)[source]

Aligns and center-crops faces based on the given landmarks.

This method takes a batch of images (can be padded), and loops through each image (represented as a numpy array) performing the following actions:

Removes the padding.

Estimates affine transformation from source landmarks to standard landmarks.

Applies transformation to align and center-crop the face based on the face factor.

Returns a batch of face images represented as numpy arrays of the same length ans the number of sets of landmarks.

Crucial role in this method plays self.landmarks_target which is the standard set of landmarks used as a reference for the source landmarks. Target and source landmark sets are used to estimate transformations of images - each image to which a set of landmarks (from source landmarks batch) belongs is transformed such that the area covers the those landmarks as the standard (target) landmarks set (as ideally as possible). For more details about target landmarks, check _init_landmarks_target().

Note

If self.allow_skew is set to True, then facial points will also be skewed to match self.landmarks_target as close as possible (resulting in, e.g., longer/flatter faces than in the original images).

Parameters:

images (ndarray | list[ndarray]) – Image batch of shape (N, H, W, 3) of type numpy.uint8 (doesn’t matter if RGB or BGR) where each nth image is transformed to extract face(-s). (H, W) should be self.resize_size. It can also be a list of numpy.uint8 numpy arrays of different shapes.
padding (Optional[ndarray]) – Padding of shape (N, 4) where the integer values correspond to the number of pixels padded from each side: top, bottom, left, right. Padding was originally applied to each image, e.g., to make the image square, so that all images could be stacked as a batch. Therefore, it is needed here to remove the padding. If specified as None, it is assumed that the images are un-padded.
indices (list[int]) – Indices list of length num_faces where each index specifies which image is used to extract faces for each set of landmarks in landmarks_source.
landmarks_source (ndarray) – Landmarks batch of shape (num_faces, self.num_std_landmarks, 2). These are landmark sets of all the desired faces to extract from the given batch of N images.

Return type:

ndarray

Returns:

A batch of aligned and center-cropped faces where the factor of the area of a face relative to the whole face image area is self.face_factor. The output is a numpy array of shape (N, H, W) of type numpy.uint8 (same channel structure as for the input images). (H, W) is defined by self.output_size.

process_batch(file_names, input_dir, output_dir)[source]

Extracts faces from a batch of images and saves them.

Takes file names, input directory, reads images and extracts faces and saves them to the output directory. This method works as follows:

Batch generation - a batch of images form the given file names is generated. Each images is padded and resized to self.resize_size while keeping the same aspect ratio.

Landmark detection - detection model is used to predict 5 landmarks for each face in each image, unless the landmarks were already initialized or face alignment + cropping is not needed.

Image enhancement - some images are enhanced if the faces compared with the image size are small. If landmarks are None, i.e., if no alignment + cropping was desired, all images are enhanced. Enhancement is not done if self.enh_threshold is None.

Image grouping - each face image is parsed, i.e., a map of face attributes is generated. Based on those attributes, each face image is put to a corresponding group. There may also be mask groups, in which case masks for each image in that group are also generated. Faces are not parsed if self.attr_groups and self.mask_groups are both None.

Image saving - each face image (and a potential mask) is saved according to the group structure (if there is any).

Note

If detection model is not used, then batch is just a list of loaded images of different dimensions.

Parameters:

file_names (list[str]) – The list of image file names (not full paths). All the images should be in the same directory.
input_dir (str) – Path to input directory with image files.
output_dir (str) – Path to output directory to save the extracted face images.

process_dir(input_dir, output_dir=None, desc='Processing')[source]

Processes images in the specified input directory.

Splits all the file names in the input directory to batches and processes batches on multiple cores. For every file name batch, images are loaded, some are optionally enhanced, landmarks are generated and used to optionally align and center-crop faces, and grouping is optionally applied based on face attributes. For more details, check process_batch().

Note

There might be a few seconds delay before the actual processing starts if there are a lot of files in the directory - it takes some time to split all the file names to batches.

Parameters:

input_dir (str) – Path to input directory with image files.
output_dir (Optional[str]) – Path to output directory to save the extracted (and optionally grouped to sub-directories) face images. If None, then the same path as for input_dir is used and additionally “_faces” suffix is added to the name.
desc (Optional[str]) – The description to use for the progress bar. If specified as None, no progress bar is shown. Defaults to “Processing”.

save_group(faces, file_names, output_dir)[source]

Saves a group of images to output directory.

Takes in a batch of faces or masks as well as corresponding file names from where the faces were extracted and saves the faces/masks to a specified output directory with the same names as those image files (appends counter suffixes if multiple faces come from the same file). If the batch of face images/masks is empty, then the output directory is not created either.

Parameters:

faces (ndarray) – Face images (cropped and aligned) represented as a numpy array of shape (N, H, W, 3) with values of type numpy.uint8 ranging from 0 to 255. It may also be face mask of shape (N, H, W) with values of 255 where some face attribute is present and 0 elsewhere.
file_names (list[str]) – The list of filenames of length N. Each face comes from a specific file whose name is also used to save the extracted face. If self.strategy allows multiple faces to be extracted from the same file, such as “all”, counters at the end of filenames are added.
output_dir (str) – The output directory to save faces.

save_groups(faces, file_names, output_dir, attr_groups, mask_groups)[source]

Saves images (and masks) group-wise.

This method takes a batch of face images of equal dimensions, a batch of file names identifying which image each face comes from, and, optionally, attribute and/or mask groups telling how to split the face images (and masks) across different folders. This method then loops through all the groups and saves images accordingly.

Example 1:

If neither attr_groups nor mask_groups are provided, the face images will be saved according to this structure:

├── output_dir
|    ├── face_image_0.jpg
|    ├── face_image_1.png
|    ...

Example 2:

If only attr_groups is provided (keys are names describing common attributes across faces in that group and they are also sub-directories of output_dir), the structure is as follows:

├── output_dir
|    ├── attribute_group_1
|    |    ├── face_image_0.jpg
|    |    ├── face_image_1.png
|    |    ...
|    ├── attribute_group_2
|    ...

Example 3:

If only mask_groups is provided (keys are names describing the mask type and they are also sub-directories of output_dir), the structure is as follows:

├── output_dir
|    ├── group_1
|    |    ├── face_image_0.jpg
|    |    ├── face_image_1.png
|    |    ...
|    ├── group_1_mask
|    |    ├── face_image_0.jpg
|    |    ├── face_image_1.png
|    |    ...
|    ├── group_2
|    |    ...
|    ├── group_2_mask
|    |    ...
|    ...

Example 4:

If both attr_groups and mask_groups are provided, then all images and masks will first be grouped by attributes and then by mask groups. The structure is then as follows:

├── output_dir
|    ├── attribute_group_1
|    |    ├── group_1_mask
|    |    |    ├── face_image_0.jpg
|    |    |    ├── face_image_1.png
|    |    |    ...
|    |    ├── group_1_mask
|    |    |    ├── face_image_0.jpg
|    |    |    ├── face_image_1.png
|    |    |    ...
|    |    ├── group_2
|    |    |    ...
|    |    ├── group_2_mask
|    |    |    ...
|    |    ...
|    |
|    ├── attribute_group_2
|    |    ...
|    ...

Parameters:

faces (ndarray) – Face images (cropped and aligned) represented as a numpy array of shape (N, H, W, 3) with values of type numpy.uint8 ranging from 0 to 255.
file_names (ndarray) – File names of images from which the faces were extracted. This value is a numpy array of shape (N,) with values of type numpy.str_. Each nth face in faces maps to exactly one file nth name in this array, thus there may be duplicate file names (because different faces may come from the same file).
output_dir (str) – The output directory where the faces or folders of faces will be saved to.
attr_groups (Optional[dict[str, list[int]]]) – Face groups by attributes. Each key represents the group name (describes common attributes across faces) and each value is a list of indices identifying faces (from faces) that should go to that group.
mask_groups (Optional[dict[str, tuple[list[int], ndarray]]]) – Face groups by extracted masks. Each key represents group name (describes the mask type) and each value is a tuple where the first element is a list of indices identifying faces (from faces) that should go to that group and the second element is a batch of masks corresponding to indexed faces represented as numpy arrays of shape (N, H, W) with values of type numpy.uint8 and being either 0 (negative) or 255 (positive).