Abstract
The convolutional neural network (CNN) has achieved good performance in object classification due to its inherent translation equivariance, but its scale equivariance is poor. A Scale-Aware Network (SA Net) with scale equivariance is proposed to estimate the scale during classification. The SA Net only learns samples of one scale in the training stage; in the testing stage, the unknown-scale testing samples are up-sampled and down-sampled, and a group of image copies with different scales are generated to form the image pyramid. The up-sampling adopts interpolation, and the down-sampling adopts interpolation combined with wavelet transform to avoid spectrum aliasing. The generated test samples with different scales are sent to the Siamese network with weight sharing for inferencing. According to the position of the maximum value of the classification-score matrix, the testing samples can be classified and the scale can be estimated simultaneously. The results on the MNIST and FMNIST datasets show that the SA Net has better performance than the existing methods. When the scale is larger than 4, the SA Net has higher classification accuracy than other methods. In the scale-estimation experiment, the SA Net can achieve low relative RMSE on any scale. The SA Net has potential for effective use in remote sensing, optical image recognition and medical diagnosis in cytohistology.
Highlights
The convolutional neural network (CNN) has achieved good performance in object detection and classification due to its inherent translation equivariance [1,2]
It can be seen that the Scale-Aware Network (SA Net) has good performance for scale estimation; for example, when the sis 4, δ can reach 13.9% on the MNIST Large-Scale dataset
The SA Net works on the FMNIST Large-Scale dataset, which are grayscale images and closer to real-world images compared to the MNIST [33]
Summary
The convolutional neural network (CNN) has achieved good performance in object detection and classification due to its inherent translation equivariance [1,2]. No matter where the input feature is, the CNN can detect the feature, predict the same label, and the data distribution is approximately unchanged. At this time, the network has same label, and the data distribution is approximately unchanged. When T ′ is an identity transformat ance can be regarded as a special case of equivariance [3] At this t2iomf 1e5 , Equa be rewritten as: global translation invariance. When T is an idenΦtit(yTtxr)an=sfΦor(mx)ation, invariance can be regarded as a special case of equivariance [3] At this time, Equation (1) can be rewritten as: Translation equivariance and invariance are the inherent properties of th are dependent on its architeΦct(uTrxe).=HΦo(wx)ever, in many practical app(2l)ication equTivraanrsilaatniocneeoqfuCivNariNancise amndorinevcarriiatinccaela.rFeothreeixnhaemrepnltep,roinpecrytiteos hofisthtoelCoNgNy,atnhde size o acreelldreepfelnedcetnstwonheittshaerrchtihteectcuerlel. Different cmellensitzeissmthuest sbceadleis-teinqguuiivshaarbilaenatmConNgNth.eInouatpdudtsi,tiaonnd,thine nthetewroerkalthwatomrlede,tsththeischange rteaqnucireembeenttwisetehne stchalee-oeqbujeivcatriaantdCtNhNe.oInbasdedrvitieorn,winiltlhedrieraelcwtloyrldre, tshuelcthianngthe einsthceale vari doibstjaenccte’sbeptwroeejenctthieoonbjoecnt atnhdetrheetoibnsae,rvtehrewreilbl dyirreectslyurletisnulgt iinnthtehsecaclhe avarraicattieornisotfitchse of ‘nea ofbajrecstm’s parlol’jeicntiohnuomn athne vreitsinioa,nt,hwerehbiychressuhltoinwgsinththeedcheaprathctearnisdticvs ooflu‘nmeaer loarfgtehaendobject in far small’ in human vision, which shows the depth and volume of the object in relation to tthheesisziezoef othfetohbejeoctb. ject
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.