Model training on OSBS dataset

Reducing tile size

In the last blog, we discussed the model training on Urban tree detection data but in this blog we'll train our model on OSBS dataset (link to dataset is attached at the end of the blog), but like one would wonder what's new in it to learn but let me clarify what was the issue was when working with this dataset, sometimes the data on which we want to train our model can be very large and could have CUDA errors while training the model so we have to crop our dataset into various parts and train them individually on our model so that our model does not give any memory errors.

High-resolution tiles may exceed GPU or CPU memory during training, especially in dense forests. To reduce the size of each tile, use preprocess.split_raster to divide the original tile into smaller pieces and create a corresponding annotations file.

So in this dataset we had .tif files for raster images and for the the each corresponding file we had raster data containing .shp files (the file that contains the geometry for all features), .shx files (the file that indexes the geometry), .prj files (the file that contains information on projection format including the coordinate system and projection information. It is a plain text file describing the projection using well-known text (WKT) format.) and .dbf files (the file that stores feature attributes in a tabular format), and all these files have same name with different extensions.

As we know Deepforest cannot directly train on shape files so we have to convert these shape files into CSV format and apparently,B deepforest provides us a function to do so

  
df=utilities.shapefile_to_annotations(shapefile, rgb)
df.to_csv("/content/dataConversion/shapefile_annotation.csv")

But after this if we directly train this CSV we gonna see an CUDA memory error, something like this

"RuntimeError: CUDA out of memory. Tried to allocate 44.45 GiB (GPU 0; 14.76 GiB total capacity; 747.96 MiB already allocated; 12.95 GiB free; 766.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

so we cannot directly train our model with this CSV file, so now we have to split our raster into small data files and get a single CSV file so that we can train on our model.

To achieve this we have a function in deepforest which was very handy in this type of situation

  
df2=preprocess.split_raster(annotations_file=
                        "/content/dataConversion/shapefile_annotation.csv", 
                            path_to_raster=rgb, numpy_image=None, 
                            base_dir='/content/dataConversion', 
                            patch_size=400, patch_overlap=0.05, 
                            allow_empty=False, image_name=None)

df2.to_csv(f"/content/dataConversion/cropRaster_annotation{counter}.csv")

where patch_size tells a number of pixels of cropped raster images like we defined 400 pixels and patches_overlap tells how much patches would be overlapping but we noticed a pattern that the model is sensitive to predicting new image resolutions that differ. We have found that increasing the patch size works better on higher-quality data.

Now we are good to go for training the model with the CSV file we acquired so I won't be going in much detail of how to train the model as we covered in the last blog, snippet for training is stated below

  
  annotationFile=f"/content/dataConversion/cropRaster_annotation{i+1}.csv"

  model = main.deepforest()

  model.config["gpus"] = "-1"
  model.config["train"]["epochs"] = 5
  model.config["epochs"] = 5
  model.config["train"]["csv_file"] = annotationFile
  model.config["score_thresh"] = 0.4
  model.config["train"]["root_dir"] = os.path.dirname(annotationFile)
  model.config["train"]["fast_dev_run"] = False

  # model.create_trainer()
  # model.use_release()
  # model.trainer.fit(model)

  trainer= pl.Trainer( max_epochs=5,
              gpus="-1",
              enable_checkpointing=False,
              accelerator='gpu',
              fast_dev_run= False)

  trainer.fit(model=model)

Link to the dataset:

OSBS image tiles

OSBS meta data for image rasters

Link to colab notebook:

Training of Deepforest model on OSBS_megaplot dataset.ipynb

Blogs

Search This Blog

Model training on OSBS dataset

Model training on OSBS dataset

Reducing tile size

Comments

Post a Comment

Popular posts from this blog

GSoC Final Report

Evaluation of predicted data

Deep Learning