HW7-2. Inference Components and Evaluation Metrics for Object Detection

20 minute read

Topics: Object detection, IoU(Intersection-over-Union), NMS(Non-Maximum Suppression), mAP(mean Average Precision)

Problem: Inference Components and Evaluation Metrics for Object Detection

In this part of the assignment, you will implement the inference and evaluation metrics required for Object Detection.

You will be implementing IoU, NMS and mAP

# We will import some libraries we need
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data._utils.collate import default_collate
from torch import optim
from torchvision import transforms
import torchvision
from torchvision import models
from torchvision.models import feature_extraction
from torchvision.ops import nms

import matplotlib.pyplot as plt
import numpy as np
import os
import time
import random
from collections import Counter
!pip -q install torchmetrics

We will use GPUs to accelerate our computation in this notebook. Run the following to make sure GPUs are enabled:

if torch.cuda.is_available():
    print('Good to go!')
    DEVICE = torch.device("cuda")
else:
    print('Please set GPU via Edit -> Notebook Settings.')
    DEVICE = torch.device("cpu")

Good to go!

def reset_seed(number):
    """
    Reset random seed to the specific number

    Inputs:
    - number: A seed number to use
    """
    random.seed(number)
    torch.manual_seed(number)
    return

def rel_error(x, y, eps=1e-10):
    """
    Compute the relative error between a pair of tensors x and y,
    which is defined as:

                            max_i |x_i - y_i]|
    rel_error(x, y) = -------------------------------
                      max_i |x_i| + max_i |y_i| + eps

    Inputs:
    - x, y: Tensors of the same shape
    - eps: Small positive constant for numeric stability

    Returns:
    - rel_error: Scalar giving the relative error between x and y
    """
    """ returns relative error between x and y """
    top = (x - y).abs().max().item()
    bot = (x.abs() + y.abs()).clamp(min=eps).max().item()
    return top / bot

(a) Intersection Over Union (IoU)

Your first task is to implement the IoU function, which estimates the intersection-over- union (IoU) of two bounding boxes (also called the Jaccard similarity). Recall that IoU for two boxes B1 and B2 is:

\[IoU(B_1, B_2) = \frac{|B_1 \cap B_2|}{|B_1 \cup B_2|}\]

This function is used to calculate the IoU between the predicted and ground-truth bounding boxes. It wil be used as part of your implementation of the mAP function, and used in the Generalized IoU loss function in the object detection model.

Implement the intersection_over_union function given below. We will then compare your implementation of IoU with a toy example. We will later reuse your IoU implementation for computing mean average precision (mAP).

def intersection_over_union(boxes1: torch.Tensor, boxes2: torch.Tensor) -> torch.Tensor:
    """
    Compute intersection-over-union (IoU) between pairs of box tensors. Input
    box tensors must in XYXY format.

    Args:
        boxes1: Tensor of shape `(M, 4)` giving a set of box co-ordinates.
        boxes2: Tensor of shape `(N, 4)` giving another set of box co-ordinates.

    Returns:
        torch.Tensor
            Tensor of shape (M, N) with `iou[i, j]` giving IoU between i-th box
            in `boxes1` and j-th box in `boxes2`.
    """

    ##########################################################################
    # TODO: Implement the IoU function here.   
    # Hint: You can  use torch.max and torch.min here  
    # Remember to account for the case that the boxes do not intersect
    # You may use a for loop if necessary, but vectorized code is always better
    ##########################################################################
    # Replace "pass" statement with your code
    
    boxes1_area = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
    boxes1_area = boxes1_area.reshape(boxes1.shape[0], 1)

    boxes2_area = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
    boxes2_area = boxes2_area.reshape(1, boxes2.shape[0])

    area_sum = boxes1_area + boxes2_area

    x_intersection_left = torch.max(boxes1[:, 0].reshape(boxes1.shape[0], 1), boxes2[:, 0])
    x_intersection_right = torch.min(boxes1[:, 2].reshape(boxes1.shape[0], 1), boxes2[:, 2])
    y_intersection_top = torch.max(boxes1[:, 1].reshape(boxes1.shape[0], 1), boxes2[:, 1])
    y_intersection_bottom = torch.min(boxes1[:, 3].reshape(boxes1.shape[0], 1), boxes2[:, 3])

    intersection_x_delta = (x_intersection_right - x_intersection_left)
    intersection_x_delta[intersection_x_delta < 0] = 0
    intersection_y_delta = (y_intersection_bottom - y_intersection_top)
    intersection_y_delta[intersection_y_delta < 0] = 0
    intersection_area = intersection_x_delta * intersection_y_delta

    union_area = area_sum - intersection_area
    iou = intersection_area / union_area

    ##########################################################################
    #                             END OF YOUR CODE                           #
    ##########################################################################
    return iou

boxes1 = torch.Tensor([[10, 10, 90, 90], [20, 20, 40, 40], [60, 60, 80, 80]])
boxes2 = torch.Tensor([[10, 10, 90, 90], [60, 60, 80, 80], [30, 30, 70, 70]])

expected_iou = torch.Tensor(
    [[1.0, 0.0625, 0.25], [0.0625, 0.0, 0.052631579], [0.0625, 1.0, 0.052631579]]
)
result_iou = intersection_over_union(boxes1, boxes2)

print("Relative error:", rel_error(expected_iou, result_iou))

Relative error: 0.0

(b) Non-Maximum Suppression (NMS)

At test time, we will discard redundant bounding box predictions as a post processing step. Specifically, we will use non-maximum suppression (NMS) to remove any bounding box prediction that significantly overlaps another bounding box prediction that has a higher confidence score. Slides 47 and 48 in the object detection lecture and Discussion 9 slides describe the NMS process.

Implement the nms function given below. We will then compare your implementation of NMS with the implementation in torchvision. Most likely, your implementation will be faster on CPU than on CUDA, and the torchvision implementation will likely be much faster than yours. This is expected, but your implementation should produce the same outputs as the torchvision version.

def nms(boxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float = 0.5):
    """
    Non-maximum suppression removes overlapping bounding boxes.

    Args:
        boxes: Tensor of shape (N, 4) giving top-left and bottom-right coordinates
            of the bounding boxes to perform NMS on.
        scores: Tensor of shpe (N, ) giving scores for each of the boxes.
        iou_threshold: Discard all overlapping boxes with IoU > iou_threshold

    Returns:
        keep: torch.long tensor with the indices of the elements that have been
            kept by NMS, sorted in decreasing order of scores;
            of shape [num_kept_boxes]
    """

    if (not boxes.numel()) or (not scores.numel()):
        return torch.zeros(0, dtype=torch.long)

    keep = None
    #############################################################################
    # TODO: Implement non-maximum suppression which iterates the following:     #
    #       1. Select the highest-scoring box among the remaining ones,         #
    #          which has not been chosen in this step before                    #
    #       2. Eliminate boxes with IoU > threshold                             #
    #       3. If any boxes remain, GOTO 1                                      #
    #       Your implementation should not depend on a specific device type;    #
    #       you can use the device of the input if necessary.                   #
    # HINT: You can refer to the torchvision library code:                      #
    # github.com/pytorch/vision/blob/main/torchvision/csrc/ops/cpu/nms_kernel.cpp
    #############################################################################
    # Replace "pass" statement with your code
    # keep = []
    # boxes_left = boxes.clone()

    # while (boxes_left.shape[0] != 0):
    #   highest_idx = torch.argmax(scores)
    #   higest_box = boxes[highest_idx]

    #   # boxes = torch.cat((boxes[:highest_idx, :], boxes[highest_idx + 1:, :]), dim = 0)
    #   # scores = torch.cat((scores[:highest_idx], scores[highest_idx + 1:]))

    #   keep.append(highest_idx.item())

    #   threshod_mask = (intersection_over_union(boxes, higest_box.reshape(1, -1)) <= iou_threshold).flatten()
    #   boxes_left = boxes_left[threshod_mask]
    
    # keep = torch.tensor(keep)

    scores_order = scores.sort()[1]
    keep = []

    while (len(scores_order) > 0):
      # print("=========New iter============")
      # print("scores_order: ", scores_order)

      highest_idx = scores_order[-1]
      # print("highest_idx: ", highest_idx)
      keep.append(highest_idx.item())
      # print("keep: ", keep)
      scores_order = scores_order[:-1]
      # print("scores_order: ", scores_order)

      higest_box = boxes[highest_idx]
      threshod_mask = (intersection_over_union(boxes[scores_order], higest_box.reshape(1, -1)) <= iou_threshold).flatten()
      scores_order = scores_order[threshod_mask]
    
    keep = torch.tensor(keep)
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################
    return keep

The difference in your implementation of nms and the nms from the torchvision package should be in the order of 1e-3 or lesser

#Sanity check

reset_seed(0)


boxes = (100.0 * torch.rand(5000, 4)).round()
boxes[:, 2] = boxes[:, 2] + boxes[:, 0] + 1.0
boxes[:, 3] = boxes[:, 3] + boxes[:, 1] + 1.0
scores = torch.randn(5000)

names = ["your_cpu", "torchvision_cpu", "torchvision_cuda"]
iou_thresholds = [0.3, 0.5, 0.7]
elapsed = dict(zip(names, [0.0] * len(names)))
intersects = dict(zip(names[1:], [0.0] * (len(names) - 1)))

for iou_threshold in iou_thresholds:
    tic = time.time()
    my_keep = nms(boxes, scores, iou_threshold)
    elapsed["your_cpu"] += time.time() - tic

    tic = time.time()
    tv_keep = torchvision.ops.nms(boxes, scores, iou_threshold)
    elapsed["torchvision_cpu"] += time.time() - tic
    intersect = len(set(tv_keep.tolist()).intersection(my_keep.tolist())) / len(tv_keep)
    intersects["torchvision_cpu"] += intersect

    tic = time.time()
    tv_cuda_keep = torchvision.ops.nms(boxes.to(device=DEVICE), scores.to(device=DEVICE), iou_threshold).to(
        my_keep.device
    )
    torch.cuda.synchronize()
    elapsed["torchvision_cuda"] += time.time() - tic
    intersect = len(set(tv_cuda_keep.tolist()).intersection(my_keep.tolist())) / len(
        tv_cuda_keep
    )
    intersects["torchvision_cuda"] += intersect

for key in intersects:
    intersects[key] /= len(iou_thresholds)

# You should see < 1% difference
print("Testing NMS:")
print("Your        CPU  implementation: %fs" % elapsed["your_cpu"])
print("torchvision CPU  implementation: %fs" % elapsed["torchvision_cpu"])
print("torchvision CUDA implementation: %fs" % elapsed["torchvision_cuda"])
print("Speedup CPU : %fx" % (elapsed["your_cpu"] / elapsed["torchvision_cpu"]))
print("Speedup CUDA: %fx" % (elapsed["your_cpu"] / elapsed["torchvision_cuda"]))
print(
    "Difference CPU : ", 1.0 - intersects["torchvision_cpu"]
)  # in the order of 1e-3 or less
print(
    "Difference CUDA: ", 1.0 - intersects["torchvision_cuda"]
)  # in the order of 1e-3 or less

Testing NMS:
Your        CPU  implementation: 1.176934s
torchvision CPU  implementation: 0.070970s
torchvision CUDA implementation: 0.009614s
Speedup CPU : 16.583588x
Speedup CUDA: 122.418907x
Difference CPU :  0.0012674271229404788
Difference CUDA:  0.0

(c) Mean Average Precision (mAP)

To evaluate the performance of our object detector, we will use the mean Average Precision (mAP) metric. mAP is computed by taking the average of precision over all the classes in the dataset. Slides 44, 45 and 46 in the object detection lecture and Discussion 9 slides gives detailed information on the process to calculate mAP.

Implement the mean_average_precision function given below. We will then compare your implementation of mAP with a toy example. Make sure the relative error you get is below 1e-5

def mean_average_precision(
    pred_boxes, true_boxes, iou_threshold=0.5, box_format="midpoint", num_classes=20
):
    """
    Calculates mean average precision 

    Parameters:
        pred_boxes (list): list of lists containing all bboxes with each bboxes
        specified as [train_idx, class_prediction, prob_score, x1, y1, x2, y2]
        true_boxes (list): Similar as pred_boxes except all the correct bounding boxes  
        specified as [train_idx, class_prediction, x1, y1, x2, y2]
        iou_threshold (float): threshold where predicted bboxes is correct
        num_classes (int): number of classes

    Returns:
        float: mAP value across all classes given a specific IoU threshold 
    """

    # list storing all AP for respective classes
    average_precisions = []

    for c in range(num_classes):
      detections = []
      ground_truths = []
      ##########################################################################
      # TODO: Implement the mAP function here.  
      # 1. For each class, find the ground truths and predicted boxes for that class and fill the list detections, ground_truths
      # 2. Sort the predicted boxes by class scores
      # 3. Find the number of ground truth bounding boxes for each image
      # 4. Keep track of all the GT boxes covered. If a predicted box is counted as True Positive based off a particular GT box, 
      #    this GT box should not be matched to any other predicted box thereafter
      # 5. For each detection, calculate IoU with gt boxes of the same image and remember to account for the gt boxes that have already been covered                           
      ##########################################################################

      # Replace "pass" statement with your code
      detections = pred_boxes[pred_boxes[:, 1] == c]
      ground_truths = true_boxes[true_boxes[:, 1] == c]

      total_gt_boxes = ground_truths.shape[0]

      if (total_gt_boxes == 0):
        continue

      confidence_list = []
      match_result = []

      for i in detections[:, 0]:
        detections_in_i = detections[detections[:, 0] == i]
        detections_in_i = detections_in_i[torch.sort(detections_in_i[:, 2], descending = True)[1]]
        gt_in_i = ground_truths[ground_truths[:, 0] == i]

        for detection in detections_in_i:
          detection_box = detection[3:].reshape(1, -1)
          gt_boxes = gt_in_i[:, 2:]

          IOUs = intersection_over_union(gt_boxes, detection_box)
        
          if IOUs.shape[0] > 0:
            max_IOU = torch.max(IOUs, dim = 0)[0]
            max_idx = torch.argmax(IOUs)
          else:
            max_IOU = -1
          
          if (max_IOU > iou_threshold):
            match_result.append(1)
            gt_boxes = torch.concat((gt_boxes[:max_idx, :], gt_boxes[max_idx + 1:, :]))
          else:
            match_result.append(0)

        confidence_list.append(detections_in_i[:, 2].item())

      confidence_list = np.array(confidence_list)

      if (len(confidence_list) == 0):
        continue

      match_result = np.array(match_result)  

      sorted_idx = np.argsort(-confidence_list)
      match_result = match_result[sorted_idx]
      
      tpADDfp = np.arange(1, len(match_result) + 1)
      tpCum = np.cumsum(match_result)
      precisions = torch.tensor(tpCum / tpADDfp)
      recalls = torch.tensor(np.nan_to_num(tpCum / total_gt_boxes))
   
      ##########################################################################
      #                             END OF YOUR CODE                           #
      ##########################################################################
      # precisions - y values, recalls - x values - we append 1 to the precisions since we want to start at 1
      # for numerical intergation and similarly we append 0 to recalls since we want to start at 0 for numerical integration
      precisions = torch.cat((torch.tensor([1]), precisions))
      recalls = torch.cat((torch.tensor([0]), recalls))  
      # torch.trapz for numerical integration
      #We do this to calculate the area under the precision - recall curve
      average_precisions.append(torch.trapz(precisions, recalls))
    map=sum(average_precisions) / len(average_precisions)

    return map

pred_boxes = torch.Tensor([[0,0,0.536,258.0, 41.0, 606.0, 285.0],[1,3,0.318,61.0, 22.75, 565.0, 632.42],[1,2,0.725,12.66,3.32,281.26,275.23]])
gt_boxes = torch.Tensor([[0,0,214.0, 41.0, 562.0, 285.0],[1,2,13.00, 22.75, 548.98, 632.42],[1,2,1.66,3.32,270.26,275.23]])
num_classes=20
iou_thresholds=torch.arange(0.5,0.96,0.05)
#iou_thresholds = torch.arange(0.5)

from torchmetrics.detection.mean_ap import MeanAveragePrecision
#MeanAveragePrecision has the foll structure preds=[dict for img1, dict for img2 ....], target=[dict for img1, dict for img2 ....]
preds = [
    dict(
        boxes=torch.tensor([[258.15, 41.29, 606.41, 285.07]]),
         scores=torch.tensor([0.536]),
         labels=torch.tensor([0]),
         ),
    dict(
         boxes=torch.tensor([[61.0, 22.75, 565.0, 632.42],[12.66,3.32,281.26,275.23]]),
         scores=torch.tensor([0.318,0.725]),
         labels=torch.tensor([3,2]),
        )
    ]
target = [
    dict(
        boxes=torch.tensor([[214.15, 41.29, 562.41, 285.07]]),
         labels=torch.tensor([0]),
         ),
    dict(
        boxes=torch.tensor([[13.00, 22.75, 548.98, 632.42],[1.66,3.32,270.26,275.23]]),
         labels=torch.tensor([2,2]),
         )
    ]

metric = MeanAveragePrecision()
metric.update(preds, target)
from pprint import pprint
expected_map=metric.compute()["map"]

result_map=0
for iou_thresh in iou_thresholds:
  iou_thresh=round(iou_thresh.item(),2)
  result_map+=mean_average_precision(pred_boxes=pred_boxes,true_boxes=gt_boxes,num_classes=num_classes,iou_threshold=iou_thresh)
result_map=result_map/iou_thresholds.shape[0]

print("resulting mAP", result_map.item())

print("Relative error:", rel_error(expected_map, result_map))

resulting mAP 0.525
Relative error: 0.002117127079184053

Youngjun Woo

Problem: Inference Components and Evaluation Metrics for Object Detection

(a) Intersection Over Union (IoU)

(b) Non-Maximum Suppression (NMS)

(c) Mean Average Precision (mAP)