In the world of detection and classification, You Only Look Once (YOLO)-v3 has been maintaining its position as the best classifier and detector and is due to its fastness, accuracy, less complex structure, continuous modifications, improvements, etc. Several approaches have been proposed in this paper to modify improved YOLO-v3. Our main contributions are (1) increasing the number of anchor boxes; (2) replacement of K-means clustering with density-based spatial clustering of applications with noise (DBSCAN); (3) introducing final bounding box alignment (FBBA) technique; and (4) replacement of multi-scaling with a depth calculation algorithm. At first, we increased the number of anchor boxes from 9 to 11 which resulted in a higher performance of 6% for both visual object classes and common objects in context datasets, where increment in computational cost was negligible. Similarly, it has been analyzed that a true number of anchor boxes can increase efficiency. So, instead of generating a static number of anchor boxes with k-means clustering, DBSCAN has been proposed for generating a dynamic and accurate number of anchor boxes. For minimizing the localization error, a new FBBA method has been introduced which would exactly overlap the object. In the same way, depth calculation algorithm can solve the small and large size object detection and classification problem without doing multi-scaling.
Read full abstract