Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.

Full Text

Published Version
Open DOI Link

Get access to 115M+ research papers

Discover from 40M+ Open access, 2M+ Pre-prints, 9.5M Topics and 32K+ Journals.

Sign Up Now! It's FREE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call