Tiny Binary Detector#

class glasses_detector.architectures.tiny_binary_detector.TinyBinaryDetector[source]#

Bases: Module

Tiny binary detector.

This is a custom detector created with the aim to contain very few parameters while maintaining a reasonable accuracy. It only has several sequential convolutional and pooling blocks (with batch-norm in between).

Note

I tried varying the architecture, including activations, convolution behavior (groups and stride), pooling, and layer structure. This also includes residual and dense connections, as well as combinations. Turns out, they do not perform as well as the current architecture which is just a bunch of CONV-RELU-BN-MAXPOOL blocks with no paddings.

forward(imgs: list[Tensor], targets: list[dict[str, Tensor]] | None = None) → dict[str, Tensor] | list[dict[str, Tensor]][source]#

Forward pass through the network.

This takes a list of images and returns a list of predictions for each image or a loss dictionary if the targets are provided. This is to match the API of the PyTorch torchvision models, which specify that:

“During training, returns a dictionary containing the classification and regression losses for each image in the batch. During inference, returns a list of dictionaries, one for each input image. Each dictionary contains the predicted boxes, labels, and scores for all detections in the image.”

Parameters:

imgs (list[torch.Tensor]) – A list of images.
annotations (list[dict[str, torch.Tensor]], optional) –
A list of annotations for each image. Each annotation is a dictionary that contains:
1. "boxes": the bounding boxes for each object
2. "labels": labels for all objects in the image. If None, the network is in inference mode.
targets (list[dict[str, Tensor]] | None)

Returns:

A dictionary with only a single “regression” loss entry if targets were specified. Otherwise, a list of dictionaries with the predicted bounding boxes, labels, and scores for all detections in each image.

Return type:

dict[str, torch.Tensor] | list[dict[str, torch.Tensor]]

compute_loss(preds: list[Tensor], targets: list[dict[str, Tensor]], size: tuple[int, int]) → dict[str, Tensor][source]#

Compute the loss for the predicted bounding boxes.

This computes the MSE loss between the predicted bounding boxes and the target bounding boxes. The returned dictionary contains only one key: “regression”.

Parameters:

preds (list[torch.Tensor]) – A list of predicted bounding boxes for each image.
targets (list[dict[str, torch.Tensor]]) – A list of targets for each image.
size (tuple[int, int])

Returns:

A dictionary with only one key: “regression” which contains the regression MSE loss.

Return type:

dict[str, torch.Tensor]

Tiny Binary Detector#

This Page