Blogs

Posts

Showing posts from September, 2022

GSoC Final Report

GSoC Final Report My journey on the Google Summer of Code project passed by so fast, A lot of stuff happened during those three months, and as I’m writing this blog post, I feel quite nostalgic about these three months. GSoC was indeed a fantastic experience. It gave me an opportunity to grow as a developer in an open source community and I believe that I ended up GSoC with a better understanding of what open source is. I learned more about the community, how to communicate with them, and who are the actors in this workflow. So, this is a summary report of all my journey at GSoC 2022. Name : Ansh Dassani Organization: NumFOCUS- Data Retriever Project title : Training and Evaluation of model on various resolutions Project link: DeepForest Mentors : Ben Weinstein , Henry Senyondo , Ethan White Introduction DeepForest is a pytho...

Deep Learning

What is deep learning? Deep learning is one of the subsets of machine learning that uses deep learning algorithms to implicitly come up with important conclusions based on input data. Genrally deeplearning is unsupervised learning or semi supervised learning and is based on representation learning that is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task, it learns from representative examples. For example: if you want to build a model that recognizes trees, you need to prepare a database that includes a lot of different tree images. The main architectures of deep learning are: -Convolutional neural networks -Recurrent neural networks -Generative adversarial networks -Recursive neural networks I'll be talking about them more in later part of this blog. Diffe...

Evaluation of predicted data

Evaluation of predicted data To convert the overlap between predicted bounding boxes and ground truth bounding boxes into a measure of accuracy and precision, the most common approach is to compare the overlap using the Intersection over Union (IoU) metric. IoU is the ratio of the overlap area of the predicted polygon box with the ground truth polygon box divided by the area of the combined bounding box. The IoU metric ranges from 0 which is not overlapping at all to 1 which is totally overlapping. In the wider computer vision literature, the common overlap threshold is 0.5, but this value is arbitrary and ultimately irrelevant to any particular ecological problem. We treat boxes with an IoU score greater than 0.4 as true positives, and boxes with scores less than 0.4 as false negatives. A value of 0.4 was chosen for threshold-based visual assessment, which indicates good visual agreement between predicted and observed crowns. We tested a range of overlap thresholds from 0.3 (less ...

Model training on OSBS dataset

Model training on OSBS dataset Reducing tile size In the last blog, we discussed the model training on Urban tree detection data but in this blog we'll train our model on OSBS dataset (link to dataset is attached at the end of the blog), but like one would wonder what's new in it to learn but let me clarify what was the issue was when working with this dataset, sometimes the data on which we want to train our model can be very large and could have CUDA errors while training the model so we have to crop our dataset into various parts and train them individually on our model so that our model does not give any memory errors. High-resolution tiles may exceed GPU or CPU memory during training, especially in dense forests. To reduce the size of each tile, use preprocess.split_raster to divide the original tile into smaller pieces and create a corresponding annotations file. So in this dataset we had .tif files for raster images and for the the each corresponding file we had raster d...

Deepforest Model Training

Deepforest Model Training DeepForest uses deep learning object detection networks to predict bounding boxes corresponding to individual trees in RGB imagery. DeepForest is built on the retinanet model from the torchvision package and designed to make training models for tree detection simpler. Prebuilt models are always improved by adding data from the target area. In our work, we found that even an hour of carefully selected hand annotations can greatly improve accuracy and precision. We believe that at least some fine-tuning of pre-built models is worthwhile for most scientific applications. Using off-the-shelf models for training, we found that 5-10 epochs were sufficient. The dataset we used to test to train our data was urban-tree-detection-data Libraries we used to achieve it: from pytorch_lightning.loops import dataloader import pandas as pd import torch from torch.utils.data import Dataset , DataLoader import deepforest from deepforest import main import ...

Deepforest and Retinanet

Deepforest and Retinanet In every blog we are covering how DeepForest works and it's use cases and how can we increase it's productivity but as we know DeepForest uses deep learning object detection networks to predict bounding boxes corresponding to individual trees in RGB imagery. DeepForest is built on the retinanet model and designed to make training models for tree detection simpler. So we should talk about Retinanet for object detection so that we can know about how actually Deepforest predicts trees from image rasters. So the first question arises why retinanet over any other model? In many papers, they addressed the problem that one-stage object detectors suffer and couldn’t compete with two-stage detectors in terms of accuracy. But, RetinaNet as a one-stage detector overcomes this problem and outperforms the best two-stage detector while still being fast. One-Stage and Two-Stage detectors One-Stage detectors One-Stage detectors make the predictions about the object in...

Sensitivity of model to input resolution

Sensitivity of the model to resolution The Deepforest model was trained on 10cm data at 400px crops is way too sensitive to the input resolution of images and quality of images, and it tends to give inaccurate results on these images and it's not possible to always have images from drones from a particular height that is 10cm in our case, so we have to come up with a solution for how to get better results for multiple resolutions of data. So we have two solutions to get better predictions, which can be preprocessing of data and retraining the model on different input resolutions. In preprocessing what we can do is to try to get nearby patch size to give better results as the resolution of the input data decreases compared to the data used to train the model, the patch size needs to be larger and we can pass the appropriate patch size in ```predict_tile``` function, but retaining of the model on different resolutions can be more amount of work but yes we tried to achieve it by evalu...