BiSeNet
- class face_crop_plus.models.bise.BiSeNet(attr_groups=None, mask_groups=None, max_batch_size=8)[source]
Bases:
Module
,LoadMixin
Face attribute parser.
This class is capable of predicting scores for 19 attributes for face images. After it identifies the closest attribute for each pixel, it can also put the whole face image to a corresponding attribute or mask group.
The 19 attributes are as follows (attributes are indicated from a person’s face perspective, meaning, for instance, left eye is the eye on the right hand-side of the picture, however, sides are not always accurate):
0 - neutral
1 - skin
2 - left eyebrow
3 - right eyebrow
4 - left eye
5 - right eye
6 - eyeglasses
7 - left ear
8 - right ear
9 - earing
10 - nose
11 - mouth
12 - upper lip
13 - lower lip
14 - neck
15 - necklace
16 - clothes
17 - hair
18 - hat
Some examples of grouping by attributes:
'glasses': [6]
- this will put each face image that contains pixels labeled as 6 to a category called ‘glasses’.'earings_and_necklace': [9, 15]
- this will put each image that contains pixels labeled as 9 and also contains pixels labeled as 15 to a category called ‘earings_and_necklace’.'no_accessories': [-6, -9, -15, -18]
- this will put each face image that does not contain pixels labeled as either 6, 9, 15, or 18 to a category called ‘no_accessories’.
Some examples of grouping by mask:
'nose': [10]
- this will put each face image that contains pixels labeled as 10 to a category called ‘nose’ and generate a corresponding mask.'eyes_and_eyebrows': [2, 3, 4, 5]
- this will put each image that contains pixels labeled as either 2, 3, 4, or 5 (or any combination of them) to a category called ‘eyes_and_eyebrows’ and generate a corresponding mask.
This class also inherits
load
method fromLoadMixin
class. The method takes a device on which to load the model and loads the model with a default state dictionary loaded fromWEIGHTS_FILENAME
file. It sets this model to eval mode and disables gradients.For more information on how BiSeNet model works, see this repo: Face Parsing PyTorch. Most of the code was taken from that repository.
Note
Whenever an input shape is mentioned, N corresponds to batch size, C corresponds to the number of channels, H - to input height, and W - to input width.
Be default, this class initializes the following attributes which can be changed after initialization of the class (but, typically, should not be changed):
- attr_join_by_and
Whether to add a face image to an attribute group if the face meets all the specified attributes in a list (joined by and) of at least one of the attributes (joined by or). Please read the definition of attr_groups to get a clearer picture. In most cases, this should be set True - if the attributes in a group list are negative, this will ensure the selected face will match none of the specified attributes. Also, if you want to join the attributes by or (any), then separate single-attribute groups can be created and manually merged into one. Defaults to True.
- Type:
bool
- attr_threshold
Threshold, based on which the attribute is determined as present in the face image. For instance, if the threshold is 5, then at least 6 pixels must be labeled of the same kind of attribute for that attribute to be considered present in the face image. Defaults to 5.
- Type:
int
- mask_threshold
Threshold, based on which the mask is considered to be a proper mask. For instance, if the threshold is 15, then face images for which the number of pixels with the values corresponding to a specified mask group (face attributes) is less than or equal to 15 will be ignored and image-mask pair for that mask category will not be generated. Defaults to 15.
- Type:
int
- mean
The list of mean values for each input channel. The pixel values should be shifted by those quantities during inference since this normalization was applied during training. Defaults to [0.485, 0.456, 0.406].
- Type:
list[float]
- std
The list of standard deviation values for each input channel. The pixel values should be scaled by those quantities during inference since this normalization was applied during training. Defaults to [0.229, 0.224, 0.225].
- Type:
list[float]
- WEIGHTS_FILENAME = 'bise_parser.pth'
The constant specifying the name of
.pth
file from which the weights for this model should be loaded. Defaults to “bise_parser.pth”.- Type:
WEIGHTS_FILENAME (str)
- __init__(attr_groups=None, mask_groups=None, max_batch_size=8)[source]
Initializes BiSeNet model.
First it assigns the passed values as attributes. Then this method initializes BiSeNet layers required for face parsing, i.e., labeling face parts.
Note
Check class definition for the possible face attribute values and examples of groups. Also note that all the specified variables here are mainly relevant only for
predict()
.- Parameters:
attr_groups (
Optional
[dict
[str
,list
[int
]]]) – Dictionary specifying attribute groups, based on which the face images should be grouped. Each key represents an attribute group name, e.g., ‘glasses’, ‘earings_and_necklace’, ‘no_accessories’, and each value represents attribute indices, e.g., [6], [9, 15], [-6, -9, -15, -18], each index mapping to some attribute. Since this model labels face image pixels, if there is enough pixels with the specified values in the list, the whole face image will be put into that attribute category. For negative values, it will be checked that the labeled face image does not contain those (absolute) values. If it is None, then there will be no grouping according to attributes. Defaults to None.mask_groups (
Optional
[dict
[str
,list
[int
]]]) – Dictionary specifying mask groups, based on which the face images and their masks should be grouped. Each key represents a mask group name, e.g., ‘nose’, ‘eyes_and_eyebrows’, and each value represents attribute indices, e.g., [10], [2, 3, 4, 5], each index mapping to some attribute. Since this model labels face image pixels, a mask will be created with ones at pixels that match the specified attributes and zeros elsewhere. Note that negative values would make no sense here and having them would cause an error. If it is None, then there will be no grouping according to mask groups. Defaults to None.max_batch_size (
int
) – The maximum batch size used when performing inference. There may be a lot of faces, in a single batch thus splitting to sub-batches for inference and then merging back predictions is a way to deal with memory errors. This is a convenience variable because batch size typically corresponds to the number of images for a single inference, but the input given inpredict()
might have a larger batch size because it represents the number of faces, many of which can come from just a single image. Defaults to 8.
- forward(x)[source]
Performs forward pass.
Takes an input batch and performs inference based on the modules it has. The input is a batch of face images and the output is a corresponding batch of pixel-wise attribute scores.
- Parameters:
x (
Tensor
) – The input tensor of shape (N, 3, H, W).- Return type:
Tensor
- Returns:
An output tensor of shape (N, 19, H, W) where each channel corresponds to a specific attribute and each value at (H, W) is an unbounded confidence score.
- group_by_attributes(parse_preds, attr_groups, offset)[source]
Groups parse predictions by face attributes.
Takes parse predictions for each face where each pixel corresponds to some attribute group (the integer value indicates that group) and extends the groups in attribute dictionary to include more samples that match the group.
- Parameters:
parse_preds (
Tensor
) – Face parsing predictions of shape (N, H, W) with integer values indicating pixel categories.attr_groups (
dict
[str
,list
[int
]]) – The dictionary with keys corresponding to attribute group names (they matchself.attr_groups
keys) and values corresponding to indices that map face images from other batches ofparse_preds
to the corresponding group. This is the dictionary that is extended and returned.offset (
int
) – The offset to add to each index. Originally, the indices will correspond only to the face parsings in the currentparse_preds
batch and the offset allows to generalize the each index by offsetting it by the previous number of processes face parsings, i.e., the offset is the number of previous batches (parse_preds
) times the batch size.
- Return type:
dict
[str
,list
[int
]]- Returns:
Similar to
attr_groups
, it is the dictionary with the same keys but values (which are lists of indices) may be extended with additional indices.
- group_by_masks(parse_preds, mask_groups, offset)[source]
Groups parse predictions by face mask attributes.
Takes parse predictions for each face where each pixel corresponds to some parse/mask group (the integer value indicates that group) and extends the groups in mask dictionary to include more samples that match the group.
- Parameters:
parse_preds (
Tensor
) – Face parsing predictions of shape (N, H, W) with integer values indicating pixel categories.mask_groups (
dict
[str
,tuple
[list
[int
],list
[ndarray
]]]) – The dictionary with keys corresponding to mask group names (they matchself.mask_groups
keys) and values corresponding to tuples where the first value is a list of indices that map face images from other batches ofparse_preds
to the corresponding group and the second is a list of corresponding masks as numpy arrays of shape (H, W) of typenumpy.uint8
with 255 at pixels that match the mask group specification and 0 elsewhere. This is the dictionary that is extended and returned.offset (
int
) – The offset to add to each index. Originally, the indices will correspond only to the face parsings in the currentparse_preds
batch and the offset allows to generalize the each index by offsetting it by the previous number of processes face parsings, i.e., the offset is the number of previous batches (parse_preds
) times the batch size.
- Return type:
dict
[str
,tuple
[list
[int
],list
[ndarray
]]]- Returns:
Similar to
mask_groups
, it is the dictionary with the same keys but values (which are tuples of a list of indices and a list of masks) may be extended with additional indices and masks.
- predict(images)[source]
Predicts attribute and mask groups for face images.
This method takes a batch of face images groups them according to the specifications in
self.attr_groups
andself.mask_groups
. For more information on how it works, see this class’ specificationBiSeNet
. It returns 2 groups maps - one for grouping face images to different attribute categories, e.g., ‘with glasses’, ‘no accessories’ and the other for grouping images to different mask groups, e.g., ‘nose’, ‘lips and mouth’.- Parameters:
images (
Tensor
|list
[Tensor
]) – Image batch of shape (N, 3, H, W) in RGB form with float values from 0.0 to 255.0. It must be on the same device as this model. A list of tensors can also be provided, however, they all must have the same spatial dimensions to be stack-able to a single batch.- Returns:
attr_groups
- each key represents attribute category and each value is a list of indices indicating which samples fromimages
batch belong to that category. It can be None ifself.attr_groups
is None.mask_groups - each key represents attribute (mask) category and each value is a tuple where the first element is a list of indices indicating which samples from
images
batch belong to that mask group and the second element is a corresponding batch of masks of shape (N, H, W) of typenumpy.uint8
with values of either 0 or 255. The masks are presented in that order as the indices indicate which face images to take for that mask group. It can be None ifself.mask_groups
is None.
- Return type:
A tuple of 2 dictionaries (either can be None)