Skip to main content

Deepforest Model Training

Deepforest Model Training

DeepForest uses deep learning object detection networks to predict bounding boxes corresponding to individual trees in RGB imagery. DeepForest is built on the retinanet model from the torchvision package and designed to make training models for tree detection simpler.

Prebuilt models are always improved by adding data from the target area. In our work, we found that even an hour of carefully selected hand annotations can greatly improve accuracy and precision. We believe that at least some fine-tuning of pre-built models is worthwhile for most scientific applications. Using off-the-shelf models for training, we found that 5-10 epochs were sufficient.

The dataset we used to test to train our data was urban-tree-detection-data 

Libraries we used to achieve it:

from pytorch_lightning.loops import dataloader
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
import deepforest
from deepforest import main
import pytorch_lightning as pl
import os
import rasterio
import glob


We had many CSV files to train our model on the dataset which had only two columns x and y present representing the location of tree crowns but the deepforest model takes input as the bounding box, not coordinates. so we thought of the robust idea that by creating a bounding box from coordinates by the error of 1 and it also should have image_path in our CSV file, following is the code snippet for it

    # making data feasible to train on our model
df = pd.read_csv(annotationFileTemp)
df['image_path'] = image_path
df['xmin']=df['x'] - 1
df['xmax']=df['x'] + 1
df['ymin']=df['y'] - 1
df['ymax']=df['y'] + 1
df['label']="Tree"
df.to_csv("/content/data/annotation.csv")


where annotationFileTemp is the original CSV file that only contains 2D coordinates, So now our dataset is ready.

We tell the config that we want to train on this csv file, and that the images are in the same directory. If images are in a separate folder, change the root_dir.

    # model configuration
annotationFile="/content/data/annotation.csv"

self.model.config["gpus"] = "-1"
self.model.config["train"]["epochs"] = 5
self.model.config["epochs"] = 5
self.model.config["train"]["csv_file"] = annotationFile
self.model.config["score_thresh"] = 0.4
self.model.config["train"]["root_dir"] = os.path.dirname(annotationFile)
self.model.config["train"]["fast_dev_run"] = False


One can get a whole set of parameters config and one can also pass any additional PyTorch lightning argument to the trainer. 

To begin training, we create a PyTorch-lightning trainer and call a trainer.fit on the model object directly on itself. While this might look a touch awkward, it is useful for exposing the PyTorch lightning functionality.

We have to create a trainer and then fit the model in that trainer


model.create_trainer()
model.use_release()
model.trainer.fit(model)


But apparently, on training, we were getting an error stating as

"ReferenceError: weakly-referenced object no longer exists", it was because of the model.trainer so what we did was created PyTorch lightening trainer outside of deepforest and ignored create_trainer and then fit the model in the trainer

# creating trainer to train the model
trainer= pl.Trainer( max_epochs=5,
gpus="-1",
enable_checkpointing=False,
accelerator='gpu',
fast_dev_run= False)
# training model on the respective data
trainer.fit(model=self.model)  

But as I stated first we had multiple CSV files so we applied the loop to the training of the model so every time it got trained more and more on multiple CSV files


training=Training()
counter=0

# Giving our model respective data files to train
for file in glob.iglob(f"/content/urban-tree-detection-data/csv/*"):
if counter<100:
file_name=""
try:
file_name=file
file_name=f"/content/urban-tree-detection-data/images/{file[39:-4]}.tif"
training.training_of_model(testing_csv=file, image_path=file_name)
except:
print(f"File has some invalid data, Filename : {file}")


Link to the colab notebook: Colab Notebook

Comments

Popular posts from this blog

GSoC Final Report

GSoC Final Report My journey on the Google Summer of Code project passed by so fast, A lot of stuff happened during those three months, and as I’m writing this blog post, I feel quite nostalgic about these three months. GSoC was indeed a fantastic experience. It gave me an opportunity to grow as a developer in an open source community and I believe that I ended up GSoC with a better understanding of what open source is. I learned more about the community, how to communicate with them, and who are the actors in this workflow. So, this is a summary report of all my journey at GSoC 2022. Name : Ansh Dassani Organization:   NumFOCUS- Data Retriever Project title : Training and Evaluation of model on various resolutions Project link:  DeepForest Mentors :  Ben Weinstein ,  Henry Senyondo , Ethan White Introduction                                        DeepForest is a python package for training and predicting individual tree crowns from airborne RGB imagery. DeepForest comes with a prebuil

GSOC Project

DeepForest This project aims to make the model which would already be trained for the classification of species and detection of alive and dead, trees or birds using transfer learning on the current release model which is based on object detection, only detecting trees and birds, for now, It also involves improving the user interface for working with the multi-class model for a better understating of the species. Basic Understanding of project Through initial understanding and contribution to DeepForest, I have grasped a basic understanding that DeepForest uses Retinanet as a one-stage object detection model that utilizes a focal loss function to address class imbalance during training and which is composed of a backbone network. Backbone Network The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-self convolutional network to predict individual tree crowns and birds from airborne RGB images. The pre-built model uses a semi

Start of the Coding Period

Start of the Coding Period After the admission to the GSoC program, there is a time period to get started with the project, contact the mentors and so on. After this, the Coding Period starts. This year, it started on May 27th. In my case, I had already contributed to DeepForest, so I had already set up my working environment even before the proposal submission. Thus, I dedicated this period to add detail to my proposal and to discuss with my mentors who were actually very helpful and were always ready to guide and discussed how to tackle the different tasks. I started by checking some papers on multi class object detection and how Resnet works, similar projects and going issue by issue in DeepForest to find all feature requests related to my project. Afterwards I outlined a list of all the methods with their priority and workflow for the whole project which was then discussed with my mentors. I immediately started with a pull request on making the model able to interact with multiple