Detector#

class glasses_detector.detector.GlassesDetector(kind: str = 'worn', size: str = 'medium', weights: bool | str | None = True, device: str | device | None = None)[source]#

Bases: BaseGlassesModel

Binary detector to check where the glasses are in the image.

This class allows to perform binary glasses and eye-area detection. By binary, it means only a single class is detected. It is possible to specify a particular kind of detection to perform, e.g., standalone glasses, worn glasses, or just the eye area.

Important

The detector cannot determine whether or not the glasses are present in the image, i.e., it will always try to predict a bounding box. If you are not sure whether the glasses may be present, please additionally use GlassesClassifier.

Warning

The pre-trained models are trained on datasets that contain just a single bounding box per image. For this reason, the number of predicted bounding boxes will always be 1. If you want to detect multiple objects in the image, please train custom models on custom datasets or share those datasets with me :).

Note

If you want to use a custom inner model, e.g., by instantiating through from_model(), please ensure that during inference in evaluation mode it outputs a list of dictionaries (one for each image in the batch) with at least one key being "boxes" which corresponds to the bounding boxes of the detected objects.


Performance of the Pre-trained Detectors

Kind

Size

MSLE \(\downarrow\)

F1 \(\uparrow\)

R2 \(\uparrow\)

IoU \(\uparrow\)

eyes

small

0.0531

1.0

0.9360

0.6188

medium

0.0523

1.0

0.9519

0.6272

large

TBA

1.0

TBA

TBA

solo

small

0.4787

1.0

0.8328

0.6485

medium

0.8282

1.0

0.9267

0.7731

large

TBA

1.0

TBA

TBA

worn

small

0.2585

1.0

0.8128

0.5427

medium

0.1352

1.0

0.9432

0.7568

large

TBA

1.0

TBA

TBA

NB: F1 score is useless because there is only one class, but is still here to emphasize this fact. Not even background is considered as a class - bbox prediction will always happen.

Size Information of the Pre-trained Detectors

Size

Architecture

Params

GFLOPs

Memory (MB)

Filesize (MB)

small

Tiny Detector

0.23M

0.001

33.99

0.91

medium

SSD Lite [2, 3]

3.71M

0.51

316.84

14.46

large

Faster R-CNN [3, 4]

TBA

TBA

TBA

TBA

Examples

Let’s instantiate the detector with default parameters:

>>> from glasses_detector import GlassesDetector
>>> det = GlassesDetector()

First, we can perform a raw prediction on an image expressed as either a path, a PIL Image or a numpy array. See predict() for more details.

>>> det(np.random.randint(0, 256, size=(16, 16, 3), dtype=np.uint8), format="int")
[[0, 0, 1, 1]]
>>> det(["path/to/image1.jpg", "path/to/image2.jpg"], format="str")
'BBoxes: 12 34 56 78; 90 12 34 56'

We can also use a more specific method process_file() which allows to save the results to a file:

>>> det.process_file("path/to/img.jpg", "path/to/pred.jpg", show=True)
... # opens a new image window with drawn bboxes
>>> det.process_file(["img1.jpg", "img2.jpg"], "preds.npy", format="bool")
>>> np.load("preds.npy").shape
(2, 256, 256)

Finally, we can also use process_dir() to process all images in a directory and save the predictions to a file or a directory:

>>> det.process_dir("path/to/dir", "path/to/preds.json", format="float")
>>> subprocess.run(["cat", "path/to/preds.json"])
{
    "img1.jpg": [[0.1, 0.2, 0.3, 0.4]],
    "img2.jpg": [[0.5, 0.6, 0.7, 0.8], [0.2, 0.8, 0.4, 0.9]],
    ...
}
>>> det.process_dir("path/to/dir", "path/to/pred_dir", ext=".jpg")
>>> subprocess.run(["ls", "path/to/pred_dir"])
img1.jpg img2.jpg ...
Parameters:
  • kind (str, optional) –

    The kind of objects to perform the detection for. Available options are:

    "eyes"

    No glasses, just the eye area

    "solo"

    Any standalone glasses in the wild

    "worn"

    Any glasses that are worn by people

    Categories are not very strict, for example, "worn" may also detect glasses on the table. Defaults to "worn".

  • size (str, optional) –

    The size of the model to use (check ALLOWED_SIZE_ALIASES for size aliases). Available options are:

    "small" or "s"

    Very few parameters but lower accuracy

    "medium" or "m"

    A balance between the number of parameters and the accuracy

    "large" or "l"

    Large number of parameters but higher accuracy

    Please check:

    Defaults to "medium".

  • weights (bool | str | None, optional) – Whether to load weights from a custom URL (or a local file if they’re already downloaded) which will be inferred based on model’s kind and size. If a string is provided, it will be used as a custom path or a URL (determined automatically) to the model weights. Defaults to True.

  • device (str | torch.device | None, optional) – Device to cast the model to (once it is loaded). If specified as None, it will be automatically checked if CUDA or MPS is supported. Defaults to None.

static draw_boxes(image: Image | ndarray | Tensor, boxes: list[list[int | float]] | ndarray | Tensor, labels: list[str] | None = None, colors: str | tuple[int, int, int] | list[str | tuple[int, int, int]] | None = 'red', fill: bool = False, width: int = 3, font: str | None = None, font_size: int | None = None) Image[source]#

Draws bounding boxes on the image.

Takes the original image and the bounding boxes and draws the them on the image. Optionally, the labels can be provided to write the label next to the bounding box.

See also

Parameters:
  • image (PIL.Image.Image | numpy.ndarray | torch.Tensor) – The original image. It can be either a PIL Image, a numpy ndarray of shape (H, W, 3) or (H, W) and type uint8 or a torch Tensor of shape (3, H, W) or (H, W) and type uint8.

  • boxes (list[list[int | float]] | numpy.ndarray | torch.Tensor) – The bounding boxes to draw. The expected shape is (N, 4) where N is the number of bounding boxes and the last dimension corresponds to the coordinates of the bounding box in the following order: x_min, y_min, x_max, y_max.

  • labels (list[str] | None, optional) – The labels corresponding to N bounding boxes. If None, no labels will be written next to the drawn bounding boxes. Defaults to None.

  • colors (list[str | tuple[int, int, int]] | str | tuple[int, int, int] | None, optional) – List containing the colors of the boxes or single color for all boxes. The color can be represented as PIL strings e.g. “red” or “#FF00FF”, or as RGB tuples e.g. (240, 10, 157). If None, random colors are generated for boxes. Defaults to "red".

  • fill (bool, optional) – If True, fills the bounding box with the specified color. Defaults to False.

  • width (int, optional) – Width of bounding box used when calling rectangle(). Defaults to 3.

  • font (str | None, optional) – A filename containing a TrueType font. If the file is not found in this filename, the loader may also search in other directories, such as the fonts/ directory on Windows or /Library/Fonts/, /System/Library/Fonts/ and ~/Library/Fonts/ on macOS. Defaults to None.

  • font_size (int | None, optional) – The requested font size in points used when calling truetype(). Defaults to None.

Returns:

The image with bounding boxes drawn on it.

Return type:

PIL.Image.Image

predict(image: FilePath | Image | ndarray, format: str | Callable[[Tensor], Default] | Callable[[Image, Tensor], Default] = 'img', input_size: tuple[int, int] | None = (256, 256)) Default[source]#
predict(image: Collection[FilePath | Image | ndarray], format: str | Callable[[Tensor], Default] | Callable[[Image, Tensor], Default] = 'img', input_size: tuple[int, int] | None = (256, 256)) list[Default]

Predicts the bounding box(-es).

Takes a path or multiple paths to image files or the loaded images themselves and outputs a formatted prediction for each image indicating the location of the object (typically, glasses). The format of the prediction, i.e., the prediction type is Default type which corresponds to DEFAULT.

Warning

If the image is provided as numpy.ndarray, make sure the last dimension specifies the channels, i.e., last dimension should be of size 1 or 3. If it is anything else, e.g., if the shape is (3, H, W), where W is neither 1 nor 3, this would be interpreted as 3 grayscale images.

Parameters:
  • image (FilePath | PIL.Image.Image | numpy.ndarray | Collection[FilePath | PIL.Image.Image | numpy.ndarray]) – The path(-s) to the image to generate the prediction for or the image(-s) itself represented as Image or as ndarray. Note that the image should have values between 0 and 255 and be of RGB format. Normalization is not needed as the channels will be automatically normalized before passing through the network.

  • format (str | dict[bool, Default] | Callable[[torch.Tensor], Default] | Callable[[PIL.Image.Image, torch.Tensor], Default], optional) –

    The string specifying the way to map the predictions to outputs of specific format. These are the following options (if image is a Collection, then the output will be a list of corresponding items of output type):

    format

    output type

    prediction mapping

    "bool"

    numpy.ndarray of type numpy.bool_ of shape (H, W)

    A numpy array of shape (H, W) (i.e., output_size) with True values for pixels that fall in any of the bounding boxes

    "int"

    list of list of int

    Bounding boxes with integer coordinates w.r.t. the original image size: [[x_min, y_min, x_max, y_max], ...]

    "float"

    list of list of float

    Bounding boxes with float coordinates normalized between 0 and 1: [[x_min, y_min, x_max, y_max], ...]

    "str"

    str

    A string of the form "BBoxes: x_min y_min x_max y_max; ..."

    "img"

    PIL.Image.Image

    The original image with bounding boxes drawn on it using default values in draw_boxes()

    A custom callback function is also possible that specifies how to map the original image (Image) and the bounding box predictions (Tensor of type torch.float32 of shape (K, 4) with K being the number of detected bboxes), or just the predictions to a formatted Default output. Defaults to "img".

  • output_size (tuple[int, int] | None, optional) – The size (width, height), or (W, H), the prediction (either the bbox coordinates or the image itself) should correspond to. If None, the prediction will correspond to the same size as the input image. Defaults to None.

  • input_size (tuple[int, int] | None, optional) – The size (width, height), or (W, H), to resize the image to before passing it through the network. If None, the image will not be resized. It is recommended to resize it to the size the model was trained on, which by default is (256, 256). Defaults to (256, 256).

Returns:

The formatted prediction or a list of formatted predictions if multiple images were provided.

Return type:

Default | list[Default]

Raises:

ValueError – If the specified format as a string is not recognized.

classmethod from_model(model: Module, **kwargs) Self#

Creates a glasses model from a custom torch.nn.Module.

Creates a glasses model wrapper for a custom provided torch.nn.Module, instead of creating a predefined one based on kind and size.

Note

Make sure the provided model’s forward method behaves as expected, i.e., returns the prediction in expected format for compatibility with predict().

Warning

model_info property will not be useful as it would return an empty dictionary for custom specified kind and size (if specified at all).

Parameters:
  • model (torch.nn.Module) – The custom model that will be assigned as model.

  • **kwargs – Keyword arguments to pass to the constructor; check the documentation of this class for more details. If task, kind, and size are not provided, they will be set to "custom". If the model architecture is custom, you may still specify the path to the pretrained wights via weights argument. Finally, if device is not provided, the model will remain on the same device as is.

Returns:

The glasses model wrapper of the same class type from which this method was called for the provided custom model.

Return type:

Self

load_weights(path_or_url: str | bool = True)#

Loads inner model weights.

Takes a path of a URL to the weights file, or True to construct the URL automatically based on model_info and loads the weights into model.

Note

If the weights are already downloaded, they will be loaded from the hub cache, which by default is ~/.cache/torch/hub/checkpoints.

Warning

If the fields in model_info are not recognized, e.g., by providing an unrecognized kind or size or by initializing with from_model(), this method will not be able to construct the URL (if path_or_url is True) and will raise a warning.

Parameters:

path_or_url (str | bool, optional) – The path or the URL (it will be inferred automatically) to the model weights (.pth file). It can also be bool, in which case True indicates to construct URL for the pre-trained weights and False does nothing. Defaults to True.

process_dir(input_path: FilePath, output_path: FilePath | None = None, ext: str | None = None, batch_size: int = 1, show: bool = False, pbar: bool | str | tqdm = True, update_total: bool = True, **pred_kwargs) dict[str, Default | None] | None#

Processes a directory of images.

Takes a path to a directory of images, optionally sub-groups to batches, generates the predictions for every image and returns them if output_path is None or saves them to a specified file or as files to a specified directory. The following cases are considered:

  1. If output_path is None, the predictions are returned as a dictionary of predictions where the keys are the names of the images and the values are the corresponding predictions.

  2. If output_path is a single file, the predictions are aggregated to a single file.

  3. If output_path is a directory, the predictions are saved to that directory. For each input path, a corresponding file is created in the specified output directory with the same name as the input. The extension, if not provided as ext, is set automatically as explained in process_file().

For more details on how each file type is saved, regardless if it is a single prediction or the aggregated predictions, see save().

NB: aggregation of images to a single file/dictionary is different from that of process_file() (when multiple file paths are passed) - here, only the names of the images are used as keys, unlike the full paths.

Tip

For very large directories, consider specifying output_path as a directory because aggregating the predictions to a single file or waiting for them to be returned might consume too much memory and lead to errors.

Note

Any files in the input directory that are not valid images or those for which the prediction fails for any reason are are simply skipped and a warning is raised - for more details, see process_file().

Parameters:
  • input_path (FilePath) – The path to a directory of images to generate predictions for.

  • output_path (FilePath | None, optional) – The path to save the prediction(-s) to. If None, the predictions are returned as a dictionary, if a single file, the predictions are aggregated to a single file, and if a directory, the predictions are saved to that directory with the names copied from inputs. Defaults to None.

  • ext (str | None, optional) – The extension to use for the output file(-s). Only used when output_path is a directory. The extension should include a leading dot, e.g., ".txt", ".npy", ".jpg" etc (see save()). If None, the behavior follows process_file(). Defaults to None.

  • batch_size (int, optional) – The batch size to use when processing the images. This groups the files in the specified directory to batches of size batch_size before processing them. In some cases, larger batch sizes can speed up the processing at the cost of more memory usage. Defaults to 1.

  • show (bool, optional) – Whether to show the predictions. Images will be shown using PIL.Image.Image.show() and other predictions will be printed to stdout. It is not recommended to set this to True as it might spam your stdout. Defaults to False.

  • pbar (bool | str | tqdm, optional) – Whether to show a progress bar. If True, a progress bar with no description is shown. If str, a progress bar with the given description is shown. If an instance of tqdm, it is used as is. Defaults to True.

  • update_total (bool, optional) – Whether to update the total number of files in the progress bar. This is only relevant if pbar is an instance of tqdm. For example, if the number of total files is already known and captured by tqdm.tqdm.total, then there is no need to update it. Defaults to True.

  • **pred_kwargs – Additional keyword arguments to pass to predict().

Returns:

The dictionary of predictions if output_path is None or None if output_path is specified.

Return type:

dict[str, Default | None] | None

process_file(input_path: FilePath | Collection[FilePath], output_path: FilePath | Collection[FilePath] | None = None, ext: str | None = None, show: bool = False, **pred_kwargs) Default | None | list[Default | None]#

Processes a single image or a list of images.

Takes a path to the image or a list of paths to images, generates the prediction(-s), and returns them, based on how predict() behaves. If the output path is specified, the prediction(-s) will be saved to the given path(-s) based on the extension of the output path. The following cases are considered:

  1. If output_path is None, no predictions are saved. If there are multiple output paths (one for each input path) and some of the entries are None, then only the outputs for the corresponding predictions are not be saved.

  2. If the output path is a single file, then the predictions are saved to that file. If there are multiple input paths, then the corresponding predictions are aggregated to a single file.

  3. If output_path is a directory, then the prediction(-s) are saved to that directory. For each input path, a corresponding file is created in the specified output directory with the same name as the input. The extension, if not provided as ext, is set to .jpg for images and .txt for other predictions.

  4. If output_path is a list of output paths, then the predictions are saved to the corresponding output paths. If the number of input paths and output paths do not match, then the number of predictions are be truncated or expanded with None to match the number of input paths and a warning is raised. all the output paths are interpreted as files.

For more details on how each file type is saved, regardless if it is a single prediction or the aggregated predictions, see save().

NB: aggregation of multiple images to a single file is different from that of process_dir() - here, the full paths are used as sample identifiers, unlike just the names of the images.

Tip

If multiple images are provided (as a list of input paths), they are likely to be loaded into a single batch for a faster prediction (see predict() for more details), thus more memory is required than if they were processed individually. For this reason, consider not to pass too many images at once (e.g., <200).

Note

If some input path does not lead to a valid image file, e.g., does not exist, its prediction is set to None. Also, if at least one prediction fails, then all predictions are set to None. In both cases, a warning is is raised and the files or the lines in the aggregated file are skipped (not saved).

Parameters:
  • input_path (FilePath | Collection[FilePath]) – The path to an image or a list of paths to images to generate predictions for.

  • output_path (FilePath | Collection[FilePath] | None, optional) – The path to save the prediction(-s) to. If None, no predictions are saved. If a single file, the predictions are aggregated (if multiple) and saved to that file. If a directory, the predictions are saved to that directory with the names copied from inputs. Defaults to None.

  • ext (str | None, optional) – The extension to use for the output file(-s). Only used when output_path is a directory. If None, the extension is set to ".jpg" for images and ".txt" for other predictions (depends on what is returned by predict() returns) For available options, refer to save(). Defaults to None.

  • show (bool, optional) – Whether to show the predictions. Images will be shown using PIL.Image.Image.show() and other predictions will be printed to stdout. Defaults to False.

  • **pred_kwargs – Additional keyword arguments to pass to predict().

Returns:

The prediction or a list of predictions for the given image(-s). Any failed predictions will be set to None.

Return type:

Default | None | list[Default | None]