Abstract

Multi-scale object detection is a basic challenge in computer vision. Although many advanced methods based on convolutional neural networks have succeeded in natural images, the progress in aerial images has been relatively slow mainly due to the considerably huge scale variations of objects and many densely distributed small objects. In this paper, considering that the semantic information of the small objects may be weakened or even disappear in the deeper layers of neural network, we propose a new detection framework called Extended Feature Pyramid Network (EFPN) for strengthening the information extraction ability of the neural network. In the EFPN, we first design the multi-branched dilated bottleneck (MBDB) module in the lateral connections to capture much more semantic information. Then, we further devise an attention pathway for better locating the objects. Finally, an augmented bottom-up pathway is conducted for making shallow layer information easier to spread and further improving performance. Moreover, we present an adaptive scale training strategy to enable the network to better recognize multi-scale objects. Meanwhile, we present a novel clustering method to achieve adaptive anchors and make the neural network better learn data features. Experiments on the public aerial datasets indicate that the presented method obtain state-of-the-art performance.

Highlights

  • With the rapid development of deep convolutional neural networks (CNNs) [1] in recent years, the conventional object detection methods [2,3] have made some remarkable achievements in natural images

  • The Extended Feature Pyramid Network (EFPN) achieves the best result with mean average precision (mAP) value of 74.67%

  • There are objects with vastly different scales and the scale average precision (AP) of some categories is small, so the average scale AP (APS, APM, APL) for all categories is generally smaller than mAP

Read more

Summary

Introduction

With the rapid development of deep convolutional neural networks (CNNs) [1] in recent years, the conventional object detection methods [2,3] have made some remarkable achievements in natural images. Many object detectors based on deep learning have avoided this multi-scale image pyramid representation mainly because it requires a lot of calculations and memories. Lin et al [11] exploited the multi-scale pyramid structure in deep CNNs to construct the Feature Pyramid Network (FPN) with a small amount of additional cost. In the FPN (Figure 1b), it adopts a bottom-up pathway, a top-down pathway and lateral connections for constructing the high-level semantic information at each scale. This structure displays an obvious improvement as a commonRefmeoatetuSernes. Since large-scale objects aofre24usually produced and predicted in the deeper convolution layers of the FPN, the boundaries of these objects might becotmoomofunzfzeyatutoreoebxttarianctoarcciun rsaotme erepgrraecstisciaolna.pFpulicratthioenrms. oHroew, tehveerF, PsiNnceuslauragell-yscpalreedobicjetcstssmaraell-scale objects iunstuhaellyshparolldouwceedralnadyeprrsedwicittehd linowthesedmeepaenrticconinvfoolurtmioantliaoynerws ohfitchhe mFPiNgh, tthne obot ubnedeanrioesuogfhthteoseidentify the classsombojaeflcltt-shscemalioegbohbjtejebccettsst.oinoTthhfueezszdhyaeltslooigwonebertarlainyoefarcstchwueriathFtePlorNwegshreeamsssaionbntei.ceFinnufraothwrmearamrteioornoef,wtthhhiecishFmPpNrigohubtslnueoamtllbyaepnerndeoduaicgdthsopted a top-dowtno sidtreunctitfuyrtehwe citlahsslaotef rtahlecoobnjencetsc.tiTohnesdteosfigunseersohfatlhloewFPlNayhearss abneednhaiwghar-eleovfetlhsiesmpraonbtliecminafnodrmation to relievaediotp. tHedoawtoepv-edro, wifnthsteruscmtuarellw-sicthalleatoerbajleccotnsndeicstaiopnps etoarfuisne sthaelldoweelpayceorsnavnodluhtigiohn-lelvaeylesresm, athnteiccontext informaitniofonrmcuateisonwtoilrledliiesvaepitp. eHaorwaetvtehr,eifstahme semtaimll-sec.ale objects disappear in the deep convolution layers, the context information cues will disappear at the same time

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call