Segmenter#
- class glasses_detector.segmenter.GlassesSegmenter(kind: str = 'smart', size: str = 'medium', weights: bool | str | None = True, device: str | device | None = None)[source]#
Bases:
BaseGlassesModel
Binary segmenter for glasses and their parts.
This class allows to perform binary segmentation of glasses or their particular parts, e.g., frames, lenses, legs, shadows, etc. Specifically, it allows to generate a mask of the same size as the input image where each pixel is mapped to a value indicating whether it belongs to the positive category (e.g., glasses) or not.
Examples
Let’s instantiate the segmenter with default parameters:
>>> from glasses_detector import GlassesSegmenter >>> seg = GlassesSegmenter()
First, we can perform a raw prediction on an image expressed as either a path, a
PIL Image
or anumpy array
. Seepredict()
for more details.>>> seg(np.random.randint(0, 256, size=(2, 2, 3), dtype=np.uint8), format="bool") tensor([[False, False], [False, False]]) >>> type(seg(["path/to/image1.jpg", "path/to/image2.jpg"], format="img")[0]) <class 'PIL.Image.Image'>
We can also use a more specific method
process_file()
which allows to save the results to a file:>>> seg.process_file("path/to/img.jpg", "path/to/pred.jpg", show=True) ... # opens a new image window with overlaid mask >>> seg.process_file(["img1.jpg", "img2.jpg"], "preds.npy", format="proba") >>> np.load("preds.npy").shape (2, 256, 256)
Finally, we can also use
process_dir()
to process all images in a directory and save the predictions to a file or a directory:>>> seg.process_dir("path/to/dir", "path/to/preds.csv", format="logit") >>> subprocess.run(["cat", "path/to/preds.csv"]) img1.jpg,-0.234,-1.23,0.123,0.123,1.435,... img2.jpg,0.034,-0.23,2.123,-1.123,0.435,... ... >>> seg.process_dir("path/to/dir", "path/to/pred_dir", ext=".jpg", format="mask") >>> subprocess.run(["ls", "path/to/pred_dir"]) img1.jpg img2.jpg ...
- Parameters:
kind (str, optional) –
The kind of glasses/parts to perform binary segmentation for. Available options are:
"frames"
Frames (including legs) of any glasses
"full"
Frames (including legs) and lenses of any glasses
"legs"
Legs of any glasses
"lenses"
Lenses of any glasses
"shadows"
Cast shadows on the skin of glasses frames only
"smart"
Like
"full"
but does not segment lenses if they are transparentDefaults to
"smart"
.size (str, optional) –
The size of the model to use (check
ALLOWED_SIZE_ALIASES
for size aliases). Available options are:"small"
or"s"
Very few parameters but lower accuracy
"medium"
or"m"
A balance between the number of parameters and the accuracy
"large"
or"l"
Large number of parameters but higher accuracy
Please check:
Performance of the Pre-trained Segmenters: to see the results of the pre-trained models for each size depending on
kind
.Size Information of the Pre-trained Segmenters: to see which architecture each size maps to and the details about the number of parameters.
Defaults to
"medium"
.weights (bool | str | None, optional) – Whether to load weights from a custom URL (or a local file if they’re already downloaded) which will be inferred based on model’s
kind
andsize
. If a string is provided, it will be used as a custom path or a URL (determined automatically) to the model weights. Defaults toTrue
.device (str | torch.device | None, optional) – Device to cast the model to (once it is loaded). If specified as
None
, it will be automatically checked if CUDA or MPS is supported. Defaults toNone
.
- static draw_masks(image: Image | ndarray | Tensor, masks: Image | list[Image] | ndarray | Tensor, alpha: float = 0.5, colors: str | tuple[int, int, int] | list[str | tuple[int, int, int]] | None = 'red') Image [source]#
Draws mask(-s) over an image.
Takes the original image and a mask or a list of masks and overlays them over the image with a specified colors and transparency.
See also
draw_segmentation_masks()
for more details about how the masks are drawn.to_image()
for more details about the expected formats if the input image and the masks are of typePIL.Image.Image
ornumpy.ndarray
.
- Parameters:
image (PIL.Image.Image | numpy.ndarray | torch.Tensor) – The original image. It can be either a PIL
Image
, a numpyndarray
of shape(H, W, 3)
or(H, W)
and typeuint8
or a torchTensor
of shape(3, H, W)
or(H, W)
and typeuint8
.masks (PIL.Image.Image | list[PIL.Image.Image] | numpy.ndarray | torch.Tensor) – The mask or a list of masks to draw over the image. It can be either a PIL
Image
or a list of them, a numpyndarray
of shape (H, W) or (N, H, W) and typeuint8
orbool_
, or a torchTensor
of shape(H, W)
or(N, H, W)
and typeuint8
orbool
. Note:N
is the number of masks.alpha (float, optional) – Float number between
0
and1
denoting the transparency of the masks.0
means full transparency,1
means no transparency. Defaults to0.5
.colors (str | tuple[int, int, int] | list[str | tuple[int, int, int]] | None, optional) – List containing the colors of the boxes or single color for all boxes. The color can be represented as PIL strings e.g. “red” or “#FF00FF”, or as RGB tuples e.g.
(240, 10, 157)
. IfNone
, random colors are generated for for each mask. Defaults to"red"
.
- Returns:
The RGB image with the mask drawn over it.
- Return type:
- predict(image: FilePath | Image | ndarray, format: str | dict[bool, Default] | Callable[[Tensor], Default] | Callable[[Image, Tensor], Default] = 'img', output_size: tuple[int, int] | None = None, input_size: tuple[int, int] | None = (256, 256)) Default [source]#
- predict(image: Collection[FilePath | Image | ndarray], format: str | dict[bool, Default] | Callable[[Tensor], Default] | Callable[[Image, Tensor], Default] = 'img', output_size: tuple[int, int] | None = None, input_size: tuple[int, int] | None = (256, 256)) list[Default]
Predicts which pixels in the image are positive.
Takes a path or multiple paths to image files or the loaded images themselves and outputs a formatted prediction indicating the semantic mask of the present glasses or their specific part(-s). The format of the prediction, i.e., the prediction type is
Default
type which corresponds toDEFAULT
.Warning
If the image is provided as
numpy.ndarray
, make sure the last dimension specifies the channels, i.e., last dimension should be of size1
or3
. If it is anything else, e.g., if the shape is(3, H, W)
, whereW
is neither1
nor3
, this would be interpreted as 3 grayscale images.- Parameters:
image (FilePath | PIL.Image.Image | numpy.ndarray | Collection[FilePath | PIL.Image.Image | numpy.ndarray]) – The path(-s) to the image to generate the prediction for or the image(-s) itself represented as
Image
or asndarray
. Note that the image should have values between 0 and 255 and be of RGB format. Normalization is not needed as the channels will be automatically normalized before passing through the network.format (str | dict[bool, Default] | Callable[[torch.Tensor], Default], optional) –
The string specifying the way to map the predictions (pixel scores) to outputs (masks) of specific format. These are the following options (if
image
is aCollection
, then the output will be alist
of corresponding items of output type):format
output type
prediction mapping
"bool"
torch.Tensor
of typetorch.bool
of shape(H, W)
"int"
torch.Tensor
of typetorch.int64
of shape(H, W)
1
for positive pixels,0
for negative pixels"logit"
torch.Tensor
of typetorch.float32
of shape(H, W)
Raw score (real number) of being a positive pixel
"proba"
torch.Tensor
of typetorch.float32
of shape(H, W)
Probability (a number between 0 and 1) of being a positive pixel
"mask"
PIL.Image.Image
of mode"L"
(grayscale)White for positive pixels, black for negative pixels
"img"
PIL.Image.Image
of mode"RGB"
(RGB)The original image with the mask overlaid on it using default values in
draw_masks()
It is also possible to provide a dictionary with 2 keys:
True
andFalse
, each mapping to values corresponding to what to output if the predicted pixel is positive or negative. Further, a custom callback function is also possible that specifies how to map the original image (Image
) and the mask prediction (Tensor
of typetorch.float32
of shape(H, W)
), or just the predictions to a formattedDefault
output. Defaults to “img”.output_size (tuple[int, int] | None, optional) – The size (width, height), or
(W, H)
, to resize the prediction (output mask) to. IfNone
, the prediction will have the same size as the input image. Defaults toNone
.input_size (tuple[int, int] | None, optional) – The size (width, height), or
(W, H)
, to resize the image to before passing it through the network. IfNone
, the image will not be resized. It is recommended to resize it to the size the model was trained on, which by default is(256, 256)
. Defaults to(256, 256)
.
- Returns:
The formatted prediction or a list of formatted predictions if multiple images were provided.
- Return type:
- Raises:
ValueError – If the specified
format
as a string is not recognized.
- classmethod from_model(model: Module, **kwargs) Self #
Creates a glasses model from a custom
torch.nn.Module
.Creates a glasses model wrapper for a custom provided
torch.nn.Module
, instead of creating a predefined one based onkind
andsize
.Note
Make sure the provided model’s
forward
method behaves as expected, i.e., returns the prediction in expected format for compatibility withpredict()
.Warning
model_info
property will not be useful as it would return an empty dictionary for custom specifiedkind
andsize
(if specified at all).- Parameters:
model (torch.nn.Module) – The custom model that will be assigned as
model
.**kwargs – Keyword arguments to pass to the constructor; check the documentation of this class for more details. If
task
,kind
, andsize
are not provided, they will be set to"custom"
. If the model architecture is custom, you may still specify the path to the pretrained wights viaweights
argument. Finally, ifdevice
is not provided, the model will remain on the same device as is.
- Returns:
The glasses model wrapper of the same class type from which this method was called for the provided custom model.
- Return type:
- load_weights(path_or_url: str | bool = True)#
Loads inner
model
weights.Takes a path of a URL to the weights file, or
True
to construct the URL automatically based onmodel_info
and loads the weights intomodel
.Note
If the weights are already downloaded, they will be loaded from the hub cache, which by default is
~/.cache/torch/hub/checkpoints
.Warning
If the fields in
model_info
are not recognized, e.g., by providing an unrecognizedkind
orsize
or by initializing withfrom_model()
, this method will not be able to construct the URL (ifpath_or_url
isTrue
) and will raise a warning.
- process_dir(input_path: FilePath, output_path: FilePath | None = None, ext: str | None = None, batch_size: int = 1, show: bool = False, pbar: bool | str | tqdm = True, update_total: bool = True, **pred_kwargs) dict[str, Default | None] | None #
Processes a directory of images.
Takes a path to a directory of images, optionally sub-groups to batches, generates the predictions for every image and returns them if
output_path
isNone
or saves them to a specified file or as files to a specified directory. The following cases are considered:If
output_path
isNone
, the predictions are returned as a dictionary of predictions where the keys are the names of the images and the values are the corresponding predictions.If
output_path
is a single file, the predictions are aggregated to a single file.If
output_path
is a directory, the predictions are saved to that directory. For each input path, a corresponding file is created in the specified output directory with the same name as the input. The extension, if not provided asext
, is set automatically as explained inprocess_file()
.
For more details on how each file type is saved, regardless if it is a single prediction or the aggregated predictions, see
save()
.NB: aggregation of images to a single file/dictionary is different from that of
process_file()
(when multiple file paths are passed) - here, only the names of the images are used as keys, unlike the full paths.Tip
For very large directories, consider specifying
output_path
as a directory because aggregating the predictions to a single file or waiting for them to be returned might consume too much memory and lead to errors.Note
Any files in the input directory that are not valid images or those for which the prediction fails for any reason are are simply skipped and a warning is raised - for more details, see
process_file()
.- Parameters:
input_path (FilePath) – The path to a directory of images to generate predictions for.
output_path (FilePath | None, optional) – The path to save the prediction(-s) to. If
None
, the predictions are returned as a dictionary, if a single file, the predictions are aggregated to a single file, and if a directory, the predictions are saved to that directory with the names copied from inputs. Defaults toNone
.ext (str | None, optional) – The extension to use for the output file(-s). Only used when
output_path
is a directory. The extension should include a leading dot, e.g.,".txt"
,".npy"
,".jpg"
etc (seesave()
). IfNone
, the behavior followsprocess_file()
. Defaults toNone
.batch_size (int, optional) – The batch size to use when processing the images. This groups the files in the specified directory to batches of size
batch_size
before processing them. In some cases, larger batch sizes can speed up the processing at the cost of more memory usage. Defaults to1
.show (bool, optional) – Whether to show the predictions. Images will be shown using
PIL.Image.Image.show()
and other predictions will be printed to stdout. It is not recommended to set this toTrue
as it might spam your stdout. Defaults toFalse
.pbar (bool | str | tqdm, optional) – Whether to show a progress bar. If
True
, a progress bar with no description is shown. Ifstr
, a progress bar with the given description is shown. If an instance oftqdm
, it is used as is. Defaults toTrue
.update_total (bool, optional) – Whether to update the total number of files in the progress bar. This is only relevant if
pbar
is an instance oftqdm
. For example, if the number of total files is already known and captured bytqdm.tqdm.total
, then there is no need to update it. Defaults toTrue
.**pred_kwargs – Additional keyword arguments to pass to
predict()
.
- Returns:
The dictionary of predictions if
output_path
isNone
orNone
ifoutput_path
is specified.- Return type:
- process_file(input_path: FilePath | Collection[FilePath], output_path: FilePath | Collection[FilePath] | None = None, ext: str | None = None, show: bool = False, **pred_kwargs) Default | None | list[Default | None] #
Processes a single image or a list of images.
Takes a path to the image or a list of paths to images, generates the prediction(-s), and returns them, based on how
predict()
behaves. If the output path is specified, the prediction(-s) will be saved to the given path(-s) based on the extension of the output path. The following cases are considered:If
output_path
isNone
, no predictions are saved. If there are multiple output paths (one for each input path) and some of the entries areNone
, then only the outputs for the corresponding predictions are not be saved.If the output path is a single file, then the predictions are saved to that file. If there are multiple input paths, then the corresponding predictions are aggregated to a single file.
If
output_path
is a directory, then the prediction(-s) are saved to that directory. For each input path, a corresponding file is created in the specified output directory with the same name as the input. The extension, if not provided asext
, is set to.jpg
for images and.txt
for other predictions.If
output_path
is a list of output paths, then the predictions are saved to the corresponding output paths. If the number of input paths and output paths do not match, then the number of predictions are be truncated or expanded withNone
to match the number of input paths and a warning is raised. all the output paths are interpreted as files.
For more details on how each file type is saved, regardless if it is a single prediction or the aggregated predictions, see
save()
.NB: aggregation of multiple images to a single file is different from that of
process_dir()
- here, the full paths are used as sample identifiers, unlike just the names of the images.Tip
If multiple images are provided (as a list of input paths), they are likely to be loaded into a single batch for a faster prediction (see
predict()
for more details), thus more memory is required than if they were processed individually. For this reason, consider not to pass too many images at once (e.g., <200).Note
If some input path does not lead to a valid image file, e.g., does not exist, its prediction is set to
None
. Also, if at least one prediction fails, then all predictions are set toNone
. In both cases, a warning is is raised and the files or the lines in the aggregated file are skipped (not saved).- Parameters:
input_path (FilePath | Collection[FilePath]) – The path to an image or a list of paths to images to generate predictions for.
output_path (FilePath | Collection[FilePath] | None, optional) – The path to save the prediction(-s) to. If
None
, no predictions are saved. If a single file, the predictions are aggregated (if multiple) and saved to that file. If a directory, the predictions are saved to that directory with the names copied from inputs. Defaults toNone
.ext (str | None, optional) – The extension to use for the output file(-s). Only used when
output_path
is a directory. IfNone
, the extension is set to".jpg"
for images and".txt"
for other predictions (depends on what is returned bypredict()
returns) For available options, refer tosave()
. Defaults toNone
.show (bool, optional) – Whether to show the predictions. Images will be shown using
PIL.Image.Image.show()
and other predictions will be printed to stdout. Defaults toFalse
.**pred_kwargs – Additional keyword arguments to pass to
predict()
.
- Returns:
The prediction or a list of predictions for the given image(-s). Any failed predictions will be set to
None
.- Return type: