Skip to main content

Deepforest and Retinanet

Deepforest and Retinanet

In every blog we are covering how DeepForest works and it's use cases and how can we increase it's productivity but as we know DeepForest uses deep learning object detection networks to predict bounding boxes corresponding to individual trees in RGB imagery. DeepForest is built on the retinanet model and designed to make training models for tree detection simpler.

So we should talk about Retinanet for object detection so that we can know about how actually Deepforest predicts trees from image rasters.

So the first question arises why retinanet over any other model?

In many papers, they addressed the problem that one-stage object detectors suffer and couldn’t compete with two-stage detectors in terms of accuracy. But, RetinaNet as a one-stage detector overcomes this problem and outperforms the best two-stage detector while still being fast.

One-Stage and Two-Stage detectors

One-Stage detectors

One-Stage detectors make the predictions about the object in the image on the grid, there is no intermediary task. So, they take an image as the input and pass it through a certain number of convolutional layers and find bounding boxes that are likely to contain the object, and then do the prediction. These models use already trained image classifiers as their backbone network to identify the objects in the image. This results in a simpler and faster model, but lack the accuracy in comparison with two-stage detectors.



Two-Stage detectors

In contrast with one-stage detectors, two-stage detectors use two stages to identify the objects in the image.

The first stage contains some region proposal networks (RPN) which reduces the number the locations which are likely to contain the objects (sometimes also called Region of Interest (ROI)) significantly. So, in the second stage, we don’t have to search over all the locations over the image to find the objects in the image but just the ones which are proposed by RPNs

Two-stage detectors also use some pre-trained image classifier as the backbone network

Some sampling techniques like Online Hard Example Mining (OHEM) or setting the foreground to background ratio are also used to strike a balance between the classes

In the second stage, classification is performed on the object locations and label the objects based on the confidence of the model

Two-stage detectors perform better than one-stage detectors but they are very slow in comparison with one-stage detectors

Problem with one-stage

Using a two-stage detector, the first stage, that is, the region proposal network significantly increases the number of locations of objects in the network, and then also uses some sampling techniques to deal with such imbalances

In one-stage detectors we end up with a large number of locations and a large number of samples are easily classified and usually contains no important information, while on the other hand there are hard examples which contain important information but are less in number

Cross-Entropy (CE) is used as the loss function, suppose there are 100k easy examples with an average loss of 0.1 and 100 hard examples with a loss of 2. Easy examples will clearly dominate the other class, so the model will focus on easy examples instead of hard ones and thus suffers in accuracy.

The loss for easy examples is almost 43 times the hard examples and so there is huge class imbalance and thus CE is not the right choice

RetinaNet

RetinaNet, a one-stage detector using the focal loss so that the lower loss is contributed by “easy” examples and loss is focusing on “hard” examples.

As shown in the figure, RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks, RetinaNet uses ResNet and Feature Pyramid Network (FPN) as the backbone networks



The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network.

The first subnet performs convolutional object classification on the backbone’s output; the second subnet performs convolutional bounding box regression.

The network was initialized with the prior probability of finding an object as 0.1 which enabled self-learning. Earlier, the first attempt was to train the network using cross-entropy loss but it failed quickly with the network diverging during the training.

Results

Results of RetinaNet, a one-stage detector using focal loss were significant even on the challenging COCO dataset and beat every one-stage and two-stage detector by a significant margin and delivered state of the art performance.

How Deepforest uses Retinanet

First we have to load backbone from Resnet_50

def load_backbone():
"""A torch vision retinanet model"""
backbone = torchvision.models.detection
                            .retinanet_resnet50_fpn(pretrained=True)

# load the model onto the computation device
return backbone


Then we can create our model with the following snippet

def create_model(num_classes, nms_thresh, score_thresh, backbone = None):
"""Create a retinanet model
Args:
num_classes (int): number of classes in the model
nms_thresh (float): non-max suppression threshold
                                            for intersection-over-union [0,1]
score_thresh (float): minimum prediction score to keep
                                                    during prediction [0,1]
Returns:
model: a pytorch nn module
"""
if not backbone:
resnet = load_backbone()
backbone = resnet.backbone
model = RetinaNet(backbone=backbone, num_classes=num_classes)
model.nms_thresh = nms_thresh
model.score_thresh = score_thresh

return model



Comments

Popular posts from this blog

GSoC Final Report

GSoC Final Report My journey on the Google Summer of Code project passed by so fast, A lot of stuff happened during those three months, and as I’m writing this blog post, I feel quite nostalgic about these three months. GSoC was indeed a fantastic experience. It gave me an opportunity to grow as a developer in an open source community and I believe that I ended up GSoC with a better understanding of what open source is. I learned more about the community, how to communicate with them, and who are the actors in this workflow. So, this is a summary report of all my journey at GSoC 2022. Name : Ansh Dassani Organization:   NumFOCUS- Data Retriever Project title : Training and Evaluation of model on various resolutions Project link:  DeepForest Mentors :  Ben Weinstein ,  Henry Senyondo , Ethan White Introduction                                        DeepForest is a python package for training and predicting individual tree crowns from airborne RGB imagery. DeepForest comes with a prebuil

GSOC Project

DeepForest This project aims to make the model which would already be trained for the classification of species and detection of alive and dead, trees or birds using transfer learning on the current release model which is based on object detection, only detecting trees and birds, for now, It also involves improving the user interface for working with the multi-class model for a better understating of the species. Basic Understanding of project Through initial understanding and contribution to DeepForest, I have grasped a basic understanding that DeepForest uses Retinanet as a one-stage object detection model that utilizes a focal loss function to address class imbalance during training and which is composed of a backbone network. Backbone Network The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network to predict individual tree crowns and birds from airborne RGB images. The pre-built model uses a semi

Start of the Coding Period

Start of the Coding Period After the admission to the GSoC program, there is a time period to get started with the project, contact the mentors and so on. After this, the Coding Period starts. This year, it started on May 27th. In my case, I had already contributed to DeepForest, so I had already set up my working environment even before the proposal submission. Thus, I dedicated this period to add detail to my proposal and to discuss with my mentors who were actually very helpful and were always ready to guide and discussed how to tackle the different tasks. I started by checking some papers on multi class object detection and how Resnet works, similar projects and going issue by issue in DeepForest to find all feature requests related to my project. Afterwards I outlined a list of all the methods with their priority and workflow for the whole project which was then discussed with my mentors. I immediately started with a pull request on making the model able to interact with multiple