Cropper
- class face_crop_plus.cropper.Cropper(output_size=256, output_format=None, resize_size=1024, face_factor=0.65, strategy='largest', padding='constant', allow_skew=False, landmarks=None, attr_groups=None, mask_groups=None, det_threshold=0.6, enh_threshold=None, batch_size=8, num_processes=1, device='cpu')[source]
Bases:
object
Face cropper class with bonus features.
This class is capable of automatically aligning and center-cropping faces, enhancing image quality and grouping the extracted faces according to specified face attributes, as well as generating masks for those attributes.
Capabilities
This class has the following 3 main features:
Face cropping - automatic face alignment and cropping based on landmarks. The landmarks can either be predicted via face detection model (see
RetinaFace
) or they can be provided as txt, csv, json etc. file. It is possible to control face factor in the extracted images and strategy of extraction (e.g., largest face, all faces per image).Face enhancement - automatic quality enhancement of images where the relative face area is small. For instance, there may be images with many faces, but the quality of those faces, if zoomed in, is low. Quality enhancement feature allows to remove the blurriness. It can also enhance the quality of every image, if desired (see
RRDBNet
).Face parsing - automatic face attribute parsing and grouping to sub-directories according selected attributes. Attributes can indicate to group faces that contain specific properties, e.g., “earrings and necklace”, “glasses”. They can also indicate what properties the faces should not include to form a group, e.g., “no accessories” group would indicate to include faces without hats, glasses, earrings, necklace etc. It is also possible to generate masks for selected face attributes, e.g., “glasses”, “eyes and eyebrows”. For more intuition on how grouping works, see
BiSeNet
andsave_groups()
.
The class is designed to perform all or some combination of the functions in one go, however, each feature is independent of one another and can work one by one. For example, it is possible to first extract all the faces in some output directory, then apply quality enhancement for every face to produce better quality faces in another output directory and then apply face parsing to group faces into different sub-folders according to some common attributes in a final output directory.
It is possible to configure the number of processing units and the batch size for significant speedups., if the hardware allows.
Examples
- Command line example
>>> python face_crop_plus -i path/to/images -o path/to/out/dir
- Auto face cropping (with face factor) and quality enhancement:
>>> cropper = Cropper(face_factor=0.7, enh_threshold=0.01) >>> cropper.process_dir(input_dir="path/to/images")
- Very fast cropping with already known landmarks (no enhancement):
>>> cropper = Cropper(landmarks="path/to/landmarks.txt", num_processes=24, enh_threshold=None) >>> cropper.process_dir(input_dir="path/to/images")
- Face cropping to attribute groups to custom output dir:
>>> attr_groups = {"glasses": [6], "no_glasses_hats": [-6, -18]} >>> cropper = Cropper(attr_groups=attr_groups) >>> inp, out = "path/to/images", "path/to/parent/out/dir" >>> cropper.process_dir(input_dir=inp, output_dir=out)
- Face cropping and grouping by face attributes (+ generating masks):
>>> groups = {"glasses": [6], "eyes_and_eyebrows": [2, 3, 4, 5]} >>> cropper = Cropper(output_format="png", mask_groups=groups) >>> cropper.process_dir("path/to/images")
For grouping by face attributes, see documented face attribute indices in
BiSeNet
.Class Attributes
For how to initialize the class and to understand its functionality better, please refer to class attributes initialized via
__init__()
. Here, further class attributes are described automatically initialized via_init_models()
and_init_landmarks_target()
.- det_model
Face detection model (
torch.nn.Module
) that is capable of detecting faces and predicting landmarks used for face alignment. SeeRetinaFace
.- Type:
- enh_model
Image quality enhancement model (torch.nn.Module) that is capable of enhancing the quality of images with faces. It can automatically detect which faces to enhance based on average face area in the image, compared to the whole image area. See
RRDBNet
.- Type:
- par_model
Face parsing model (torch.nn.Module) that is capable of classifying pixels according to specific face attributes, e.g., “left_eye”, “earring”. It is able to group faces to different groups and generate attribute masks. See
BiSeNet
.- Type:
- landmarks_target
Standard normalized landmarks of shape (
self.num_std_landmarks
, 2). These are scaled byself.face_factor
and used as ideal landmark coordinates for the extracted faces. In other words, they are reference landmarks used to estimate the transformation of an image based on some actual set of face landmarks for that image.- Type:
numpy.ndarray
- __init__(output_size=256, output_format=None, resize_size=1024, face_factor=0.65, strategy='largest', padding='constant', allow_skew=False, landmarks=None, attr_groups=None, mask_groups=None, det_threshold=0.6, enh_threshold=None, batch_size=8, num_processes=1, device='cpu')[source]
Initializes the cropper.
Initializes class attributes.
- Parameters:
output_size (
int
|tuple
[int
,int
] |list
[int
]) – The output size (width, height) of cropped image faces. If provided as a single number, the same value is used for both width and height. Defaults to 256.output_format (
Optional
[str
]) – The output format of the saved face images. For available options, see OpenCV imread. If not specified, then the same image extension will not be changed, i.e., face images will be of the same format as the images from which they are extracted. Defaults to None.resize_size (
int
|tuple
[int
,int
] |list
[int
]) – The interim size (width, height) each image should be resized to before processing images. This is used to resize images to a common size to allow to make a batch. It should ideally be the mean width and height of all the images to be processed (but can simply be a square). Images will be resized to to the specified size while maintaining the aspect ratio (one of the dimensions will always match either the specified width or height). The shorter dimension would afterwards be padded - for more information on how it works, seeutils.create_batch_from_files()
. Defaults to 1024.face_factor (
float
) – The fraction of the face area relative to the output image. Defaults to 0.65.strategy (
str
) –The strategy to use to extract faces from each image. The available options are:
”all” - all faces will be extracted form each image.
”best” - one face with the largest confidence score will be extracted from each image.
”largest” - one face with the largest face area will be extracted from each image.
For more info, see
RetinaFace.__init__()
. Defaults to “largest”.padding (
str
) – The padding type (border mode) to apply when cropping out faces. If faces are near edge, some part of the resulting center-cropped face image may be blank, in which case it can be padded with specific values. For available options, see OpenCV BorderTypes. If specified as “constant”, the value of 0 will be used. Defaults to “reflect”.allow_skew (
bool
) – Whether to allow skewing when aligning the face according to its landmarks. If True, then facial points will be matched very closely to the ideal standard landmark points (which is a set of reference points created internally when preforming the transformation). If all faces face forward, i.e., in portrait-like manner, then this could be set to True which results in minimal perspective changes. However, most of the time this should be set to False to preserve the face perspective. For more details, seecrop_align()
. Defaults to False.landmarks (
Union
[str
,tuple
[ndarray
,ndarray
],None
]) –If landmarks are already known, they should be specified via this variable. If specified, landmark estimation will not be performed. There are 2 ways to specify landmarks:
As a path to landmarks file, in which case str should be provided. The specified file should contain file (image) names and corresponding landmark coordinates. Duplicate file names are allowed (in case multiple faces are present in the same image). For instance, it could be .txt file where each row contains space-separated values: the first value is the file name and the other 136 values represent landmark coordinates in x1, y1, x2, y2, … format. For more details about the possible file formats and how they are parsed, see
parse_landmarks_file()
.As a tuple of 2 numpy arrays. The first one is of shape (
num_faces
,num_landm
, 2) of typenumpy.float32
and represents the landmarks of every face that is going to be extracted from images. The second is a numpy array of shape (num_faces
,) of typenumpy.str_
where each value specifies a file name to which a corresponding set of landmarks belongs.
If not specified, 5 landmark coordinates will be estimated for each face automatically. Defaults to None.
attr_groups (
Optional
[dict
[str
,list
[int
]]]) – Attribute groups dictionary that specifies how to group the output face images according to some common attributes. The keys are names describing some common attribute, e.g., “glasses”, “no_accessories” and the values specify which attribute indices belong (or don’t belong, if negative) to that group, e.g., [6], [-6, -9, -15]. For more information, seeBiSeNet
andsave_groups()
. If not provided, output images will not be grouped by attributes and no attribute sub-folders will be created in the desired output directory. Defaults to None.mask_groups (
Optional
[dict
[str
,list
[int
]]]) – Mask groups dictionary that specifies how to group the output face images according to some face attributes that make up a segmentation mask. The keys are mask type names, e.g., “eyes”, and the values specify which attribute indices should be considered for that mask, e.g., [4, 5]. For every group, not only face images will be saved in a corresponding sub-directory, but also black and white face attribute masks (white pixels indicating the presence of a mask attribute). For more details, see For more info, seeBiSeNet
andsave_groups()
. If not provided, no grouping is applied. Defaults to None.det_threshold (
Optional
[float
]) – The visual threshold, i.e., minimum confidence score, for a detected face to be considered an actual face. SeeRetinaFace.__init__()
for more details. If None, no face detection will be performed. Defaults to 0.6.enh_threshold (
Optional
[float
]) – Quality enhancement threshold that tells when the image quality should be enhanced (it is an expensive operation). It is the minimum average face factor, i.e., face area relative to the image, below which the whole image is enhanced. It is advised to set this to a low number, like 0.001 - very high fractions might unnecessarily cause the image quality to be improved. Defaults to None.batch_size (
int
) – The batch size. It is the maximum number of images that can be processed by every processor at a single time-step. Large values may result in memory errors, especially, when GPU acceleration is used. Increase this if less models (i.e., landmark detection, quality enhancement, face parsing models) are used and decrease otherwise. Defaults to 8.num_processes (
int
) – The number of processes to launch to perform image processing. Each process works in parallel on multiple threads, significantly increasing the performance speed. Increase if less prediction models are used and increase otherwise. Defaults to 1.device (
str
|device
) – The device on which to perform the predictions, i.e., landmark detection, quality enhancement and face parsing. If landmarks are provided, no enhancement and no parsing is desired, then this has no effect. Defaults to “cpu”.
- _init_landmarks_target()[source]
Initializes target landmarks set.
This method initializes a set of standard landmarks. Standard, or target, landmarks refer to an average set of landmarks with ideal normalized coordinates for each facial point. The source facial points will be rotated, scaled and translated to match the standard landmarks as close as possible.
Both source (computed separately for each image) and target landmarks must semantically match, e.g., the left eye coordinate in target landmarks also corresponds to the left eye coordinate in source landmarks.
There should be a standard landmarks set defined for a desired number of landmarks. Each coordinate in that set is normalized, i.e., x and y values are between 0 and 1. These values are then scaled based on face factor and resized to match the desired output size as defined by
self.output_size
.Note
Currently, only 5 standard landmarks are supported.
- Raises:
ValueError – If the number of standard landmarks is not supported. The number of standard landmarks is
self.num_std_landmarks
.
- _init_models()[source]
Initializes detection, enhancement and parsing models.
- The method initializes 3 models:
If
self.det_threshold
is provided and no landmarks are known in advance, the detection model is initialized to estimate 5-point landmark coordinates. For more info, seeRetinaFace
.If
self.enh_threshold
is provided, the quality enhancement model is initialized. For more info, seeRRDBNet
.If
self.attr_groups
orself.mask_groups
is provided, then face parsing model is initialized. For more info, seeBiSeNet
.
Note
This is a useful initializer function if multiprocessing is used, in which case copies of all the models can be created on separate cores.
- crop_align(images, padding, indices, landmarks_source)[source]
Aligns and center-crops faces based on the given landmarks.
This method takes a batch of images (can be padded), and loops through each image (represented as a numpy array) performing the following actions:
Removes the padding.
Estimates affine transformation from source landmarks to standard landmarks.
Applies transformation to align and center-crop the face based on the face factor.
Returns a batch of face images represented as numpy arrays of the same length ans the number of sets of landmarks.
Crucial role in this method plays
self.landmarks_target
which is the standard set of landmarks used as a reference for the source landmarks. Target and source landmark sets are used to estimate transformations of images - each image to which a set of landmarks (from source landmarks batch) belongs is transformed such that the area covers the those landmarks as the standard (target) landmarks set (as ideally as possible). For more details about target landmarks, check_init_landmarks_target()
.Note
If
self.allow_skew
is set to True, then facial points will also be skewed to matchself.landmarks_target
as close as possible (resulting in, e.g., longer/flatter faces than in the original images).- Parameters:
images (
ndarray
|list
[ndarray
]) – Image batch of shape (N, H, W, 3) of typenumpy.uint8
(doesn’t matter if RGB or BGR) where each nth image is transformed to extract face(-s). (H, W) should beself.resize_size
. It can also be a list ofnumpy.uint8
numpy arrays of different shapes.padding (
Optional
[ndarray
]) – Padding of shape (N, 4) where the integer values correspond to the number of pixels padded from each side: top, bottom, left, right. Padding was originally applied to each image, e.g., to make the image square, so that all images could be stacked as a batch. Therefore, it is needed here to remove the padding. If specified as None, it is assumed that the images are un-padded.indices (
list
[int
]) – Indices list of length num_faces where each index specifies which image is used to extract faces for each set of landmarks inlandmarks_source
.landmarks_source (
ndarray
) – Landmarks batch of shape (num_faces,self.num_std_landmarks
, 2). These are landmark sets of all the desired faces to extract from the given batch of N images.
- Return type:
ndarray
- Returns:
A batch of aligned and center-cropped faces where the factor of the area of a face relative to the whole face image area is
self.face_factor
. The output is a numpy array of shape (N, H, W) of typenumpy.uint8
(same channel structure as for the input images). (H, W) is defined byself.output_size
.
- process_batch(file_names, input_dir, output_dir)[source]
Extracts faces from a batch of images and saves them.
Takes file names, input directory, reads images and extracts faces and saves them to the output directory. This method works as follows:
Batch generation - a batch of images form the given file names is generated. Each images is padded and resized to
self.resize_size
while keeping the same aspect ratio.Landmark detection - detection model is used to predict 5 landmarks for each face in each image, unless the landmarks were already initialized or face alignment + cropping is not needed.
Image enhancement - some images are enhanced if the faces compared with the image size are small. If landmarks are None, i.e., if no alignment + cropping was desired, all images are enhanced. Enhancement is not done if
self.enh_threshold
is None.Image grouping - each face image is parsed, i.e., a map of face attributes is generated. Based on those attributes, each face image is put to a corresponding group. There may also be mask groups, in which case masks for each image in that group are also generated. Faces are not parsed if
self.attr_groups
andself.mask_groups
are both None.Image saving - each face image (and a potential mask) is saved according to the group structure (if there is any).
Note
If detection model is not used, then batch is just a list of loaded images of different dimensions.
- Parameters:
file_names (
list
[str
]) – The list of image file names (not full paths). All the images should be in the same directory.input_dir (
str
) – Path to input directory with image files.output_dir (
str
) – Path to output directory to save the extracted face images.
- process_dir(input_dir, output_dir=None, desc='Processing')[source]
Processes images in the specified input directory.
Splits all the file names in the input directory to batches and processes batches on multiple cores. For every file name batch, images are loaded, some are optionally enhanced, landmarks are generated and used to optionally align and center-crop faces, and grouping is optionally applied based on face attributes. For more details, check
process_batch()
.Note
There might be a few seconds delay before the actual processing starts if there are a lot of files in the directory - it takes some time to split all the file names to batches.
- Parameters:
input_dir (
str
) – Path to input directory with image files.output_dir (
Optional
[str
]) – Path to output directory to save the extracted (and optionally grouped to sub-directories) face images. If None, then the same path as forinput_dir
is used and additionally “_faces” suffix is added to the name.desc (
Optional
[str
]) – The description to use for the progress bar. If specified asNone
, no progress bar is shown. Defaults to “Processing”.
- save_group(faces, file_names, output_dir)[source]
Saves a group of images to output directory.
Takes in a batch of faces or masks as well as corresponding file names from where the faces were extracted and saves the faces/masks to a specified output directory with the same names as those image files (appends counter suffixes if multiple faces come from the same file). If the batch of face images/masks is empty, then the output directory is not created either.
- Parameters:
faces (
ndarray
) – Face images (cropped and aligned) represented as a numpy array of shape (N, H, W, 3) with values of typenumpy.uint8
ranging from 0 to 255. It may also be face mask of shape (N, H, W) with values of 255 where some face attribute is present and 0 elsewhere.file_names (
list
[str
]) – The list of filenames of length N. Each face comes from a specific file whose name is also used to save the extracted face. Ifself.strategy
allows multiple faces to be extracted from the same file, such as “all”, counters at the end of filenames are added.output_dir (
str
) – The output directory to savefaces
.
- save_groups(faces, file_names, output_dir, attr_groups, mask_groups)[source]
Saves images (and masks) group-wise.
This method takes a batch of face images of equal dimensions, a batch of file names identifying which image each face comes from, and, optionally, attribute and/or mask groups telling how to split the face images (and masks) across different folders. This method then loops through all the groups and saves images accordingly.
- Example 1:
If neither
attr_groups
normask_groups
are provided, the face images will be saved according to this structure:├── output_dir | ├── face_image_0.jpg | ├── face_image_1.png | ...
- Example 2:
If only
attr_groups
is provided (keys are names describing common attributes across faces in that group and they are also sub-directories ofoutput_dir
), the structure is as follows:├── output_dir | ├── attribute_group_1 | | ├── face_image_0.jpg | | ├── face_image_1.png | | ... | ├── attribute_group_2 | ...
- Example 3:
If only
mask_groups
is provided (keys are names describing the mask type and they are also sub-directories ofoutput_dir
), the structure is as follows:├── output_dir | ├── group_1 | | ├── face_image_0.jpg | | ├── face_image_1.png | | ... | ├── group_1_mask | | ├── face_image_0.jpg | | ├── face_image_1.png | | ... | ├── group_2 | | ... | ├── group_2_mask | | ... | ...
- Example 4:
If both
attr_groups
andmask_groups
are provided, then all images and masks will first be grouped by attributes and then by mask groups. The structure is then as follows:├── output_dir | ├── attribute_group_1 | | ├── group_1_mask | | | ├── face_image_0.jpg | | | ├── face_image_1.png | | | ... | | ├── group_1_mask | | | ├── face_image_0.jpg | | | ├── face_image_1.png | | | ... | | ├── group_2 | | | ... | | ├── group_2_mask | | | ... | | ... | | | ├── attribute_group_2 | | ... | ...
- Parameters:
faces (
ndarray
) – Face images (cropped and aligned) represented as a numpy array of shape (N, H, W, 3) with values of typenumpy.uint8
ranging from 0 to 255.file_names (
ndarray
) – File names of images from which the faces were extracted. This value is a numpy array of shape (N,) with values of typenumpy.str_
. Each nth face infaces
maps to exactly one file nth name in this array, thus there may be duplicate file names (because different faces may come from the same file).output_dir (
str
) – The output directory where the faces or folders of faces will be saved to.attr_groups (
Optional
[dict
[str
,list
[int
]]]) – Face groups by attributes. Each key represents the group name (describes common attributes across faces) and each value is a list of indices identifying faces (from faces) that should go to that group.mask_groups (
Optional
[dict
[str
,tuple
[list
[int
],ndarray
]]]) – Face groups by extracted masks. Each key represents group name (describes the mask type) and each value is a tuple where the first element is a list of indices identifying faces (fromfaces
) that should go to that group and the second element is a batch of masks corresponding to indexed faces represented as numpy arrays of shape (N, H, W) with values of typenumpy.uint8
and being either 0 (negative) or 255 (positive).