Abstract
We propose an object detection method that predicts the orientation bounding boxes (OBB) to estimate objects locations, scales and orientations based on YOLO (You Only Look Once), which is one of the top detection algorithms performing well both in accuracy and speed. Horizontal bounding boxes(HBB), which are not robust to orientation variances, are used in the existing object detection methods to detect targets. The proposed orientation invariant YOLO (OIYOLO) detector can effectively deal with the bird’s eye viewpoint images where the orientation angles of the objects are arbitrary. In order to estimate the rotated angle of objects, we design a new angle loss function. Therefore, the training of OIYOLO forces the network to learn the annotated orientation angle of objects, making OIYOLO orientation invariances. The proposed approach that predicts OBB can be applied in other detection frameworks. In additional, to evaluate the proposed OIYOLO detector, we create an UAV-DAHUA datasets that annotated with objects locations, scales and orientation angles accurately. Extensive experiments conducted on UAV-DAHUA and DOTA datasets demonstrate that OIYOLO achieves state-of-the-art detection performance with high efficiency comparing with the baseline YOLO algorithms.
Highlights
Object detection is an important and challenging technology related to computer vision
We propose an object detection method that predicts the orientation bounding box (OBB) to estimate objects locations, scales and orientations based on YOLO (You Only Look Once)
We compare the performance between orientation invariant YOLO (OIYOLO) and YOLO in bird’s eye views
Summary
Object detection is an important and challenging technology related to computer vision. Object detection can be divided into two components: object recognition and object localization. Object localization is always achieved by regressing positions and scales of targets. Deep learning has fundamentally changed how computers perform object detection. In two-stage approaches, proposals are firstly generated by selective search[5,6,10,11] or region proposal network[7], and classification and regression are operated on them. The two-stage methods have been achieving top performances on several objects detection challenging benchmarks, including PASCAL VOC[12] and MSCOCO[13], they are often too slow for real-time applications even if on a high computation capability hardware. Considering the high efficiency, the one-stage approach attracts much more attention recently
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.