COCO Dataset - use Faster RCNN + MobileNet to conduct Object Detection
Introduction
Recently I took an AI course, the main content is the following topics:
- Learn about Coco dataset
- User pre-trained version of Faster R-CNN to predict the bounding box
- Calculate IoU
Homework Requirement
- Download the Coco Collection*: download the files “2017 Val images [5/1GB]” and “2017 Train/Val annotations [241MB]” from the Coco page.
Download from Coco page. You can load them into your notebook using the pycocotools library. - Randomly select ten from the dataset: 10 images are randomly selected from this dataset.
- Predict the box using the pre-trained model FasterR-CNN: use a pre-trained version of the Faster R-CNN (Resnet50 backbone) to predict the bounding box of the object on the 10 images. of the bounding box. Only regions with scores greater than 0.8 are retained.
- isualize the model together with the answer visualization*: Visualize the predicted bounding boxes and label together with the ground truth bounding
boxes and label. Show all 10 pairs of images side by side in the jupyter notebook. - Use another pre-trained model Mobilnet: Repeat the above steps using the Mobilenet backbone of the Faster R-CNN.
- Calculate IoU Compare Models: Which backbone provides better results? Calculate the IoU for both methods.
Task 1: Downloading the COCO Dataset
Task 1
- Download the COCO Dataset: Obtain the files “2017 Val images [5/1GB]” and “2017 Train/Val annotations [241MB]” from the Coco page. Utilize the pycocotools library to import them into your notebook.
You can follow this guide to proceed with the download: Download COCO Dataset
1 | . |
- Download these two files as shown in the image.
- After downloading, the folder structure upon extraction will resemble the one above.
Task 2: Randomly Select Ten Images
Task 2
2. Randomly Select Ten Images from the Dataset: Pick 10 images randomly from this dataset.
Here, we’ll primarily do a few things:
- Import necessary libraries.
- Set up the COCO API to allow it to access relevant information from our dataset, such as bounding box positions, label locations, and image information.
- Visualize images and perform annotations.
- Randomly select ten images.
Let’s begin by importing the necessary libraries.
1 |
|
Setting up the COCO API
COCO provides an API to access datasets. By providing it with a JSON file, we can easily retrieve the necessary information such as images, labels, bounding boxes, and more.
1 | # Specify dataset location |
Result
1 | Annotation file: ../../Data/Coco/annotations/instances_val2017.json |
Annotation Visualization
To ensure familiarity with the COCO-provided API, here’s an exercise focusing on the following:
- Obtaining image info by ID
- Retrieving annotation info by ID
- Learning to draw bounding boxes and labels on images
1 | import matplotlib.pyplot as plt |
Result
Randomly Select Ten Images
1 | def random_select(coco, cocoRoot, dataType, num_images=10): |
Result
Task 3+5: FasterR-CNN v.s Mobilnet
Task 3 & 5
3. Predicting bboxes using the pre-trained model FasterR-CNN:Use a pre-trained version of Faster R-CNN (Resnet50 backbone) to predict the bounding box
of objects on the 10 images. Only keep regions that have a score > 0.8.
5. Using another pre-trained model Mobilnet:Repeat the steps from above using a Mobilenet backbone for the Faster R-CNN.
Using pre-train model
1 | # using pre-train model (FasterR-CNN) |
Convert image to tensor
We need to be able to take the image out of the picture based on the position of the image. Then we need to convert the image read from the book into a tensor before we can put it into the model for prediction. So we made two functions:
- One is to read the picture
- One is to convert the picture to a tensor.
1 | from PIL import Image |
Training the model
After the pre-training model is loaded, we need to train the model. The training process is as follows:
1 | # Save the prediction result |
Only select the prediction results > 0.8
After the model is trained, we need to select the prediction results that are greater than 0.8. The reason is that the model will predict a lot of bounding boxes, but we only need the bounding boxes with high accuracy. So we need to filter out the bounding boxes with low accuracy. The code is as follows:
1 | def filter_valid_boxes(predictions, threshold=0.8): |
Task 4+6: Visualization + IoU
Tasks 4 & 6
- Visualize the model together with the solution: Visualize the predicted bounding boxes and label together with the ground truth bounding
- CalculateIoU to compare models: Which backbone delivers the better results? Calculate the IoU for both approaches.
There are a few important points in visual dialog, the steps are as follows:
- We need to know the id of the image first, and get the annotation information based on the id, then we can Calculate the IoU.
- We take the annotation information and the model information to conduct the IoU Calculate.
- We read the location of the image in the computer, and according to the path of the image, we draw the image through plt first.
- We read the location of the picture in the computer, and based on the path of the picture, we draw the picture through plt first, and then based on the picture, we can draw the prediction box and label on it, as well as the average value of the IoU.
The following program is the procedure described above, we will draw the results of both models and calculate the average of the IoU.
1 | import matplotlib.pyplot as plt |
Result
Supplement: IoU
- Ref: https://blog.csdn.net/IAMoldpan/article/details/78799857
- Ref: https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
IoU (Intersection over Union) is a metric used to evaluate object detection algorithms. It is defined as the intersection area
of the predicted box
and the true box
divided by their union area
. The value ranges between 0 and 1, where a higher value indicates a greater overlap between the predicted and true boxes, signifying more accurate predictions.
Source: https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
From the example above, you might be wondering, what exactly does the following code segment do?
1 | torchvision.ops.box_iou(bbox_tlist_anns, bbox_tlist_model) |
- Essentially, it calculates the Intersection over Union (IoU) between all predicted bounding boxes of the ground truth and all predicted bounding boxes of the model. This process returns a tensor with the shape (number of ground truth bounding boxes, number of model’s predicted bounding boxes). Refer to the images below.
Here, we only need to obtain the maximum IoU for each ground truth bounding box and calculate the average value, so we use the following code
1 | # After obtaining the maximum value for each predicted box of the annotation (see supplementary IoU for details), calculate the average IoU |
You might be curious whether using functions like max()
, mean()
, sum()
will affect our results?
As we can see from the above image
- Using
sum()
, you may find that the value can exceed 1, which is not a reasonable range for IoU values. - Using
max()
, it chooses, for each ground truth bounding box, the closest predicted bounding box from the model as its IoU. Then, we can obtain the maximum IoU values forall predicted bounding boxes
of the ground truth and calculate the average to determine the overall IoU. - Using
mean()
poses a problem as the IoU calculation will never be 1. This is because it considers the IoUs of other bounding boxes, which lowers the overall IoU. For instance, if the ground truth has two bounding boxes[A1,A2]
, and the model also predicts two[B1,B2]
, it’s clear that B1 predicts A1, and B2 predicts A2, and the model predicts accurately. However, using mean will incorrectly consider B1 as A2 and B2 as A1, which is wrong, and these pairs have low IoUs. Thus, usingmean()
in this way will unjustly lower the IoU, making it unreasonable.