Abstract

Many modern CNNs feature complex architecture topologies with different layer types. One of these special layers is a fractionally-strided or transposed convolution (T-CONV) layer [1] , which is an up-sampling layer that uses trained weights to produce enlarged high-resolution feature maps. An atrous or dilated convolution (D-CONV) layer is another special layer that maintains the resolution and coverage of feature maps by expanding the receptive fields of convolution filters as discussed in [2] . Both T-CONV and D-CONV layers can be naïvely implemented as normal convolution (N-CONV) layers by inserting S ′ − 1 zeros between adjacent pixels of the input feature maps (FMs) for T-CONV or d − 1 zeros between adjacent values of the filters for D-CONV, where S ′ is T-CONV stride and d is D-CONV dilation rate. This approach, however, leads to a huge underutilization of computation resources due to the introduced zero MAC operations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call