In respect of pig instance segmentation, the application of traditional computer vision techniques is constrained by sundries barrier, overlapping, and different perspectives in the pig breeding environment. In recent years, the attention-based methods have achieved remarkable performance. In this paper, we introduce two types of attention blocks into the feature pyramid network (FPN) (see nomenclature table) framework, which encode the semantic interdependencies in the channel (named channel attention block (CAB)) (see nomenclature table) and spatial (named spatial attention block (SAB)) (see nomenclature table) dimensions, respectively. By integrating the associated features, the CAB selectively emphasizes the interdependencies among the channels. Meanwhile, the SAB selectively aggregates the features at each position through a weighted sum of the features at all positions. A dual attention block (DAB) (see nomenclature table) is proposed to integrate CAB features with SAB information flexibly. A total of 45 pigs with 8 pens are captured as the experiment subjects. In comparison with such state-of-art attention modules as convolutional block attention module (CBAM) (see nomenclature table), bottleneck attention module (BAM) (see nomenclature table), and spatial-channel squeeze & excitation (SCSE) (see nomenclature table), embedding DAB can contribute to the most significant performance improvement in different task networks with distinct backbone networks. Especially with HTC-R101-DAB (hybrid task cascade) (see nomenclature table), the best performance is produced, with the AP0.5 (average precision) (see nomenclature table) AP0.75, AP0.5:0.95, and AP0.5:0.95-large reaching 93.1%, 84.1%, 69.4%, and 71.8%, respectively. Also, as indicated by ablation experiments, the SAB contributes more than CAB. Meanwhile, the predictive results appear a trend of increasing initially and decreasing afterwards after different numbers of SAB are merged. Besides, as revealed by the visualization of attention maps, attention blocks can extract regions with similar semantic information. The attention-based models also produce outstanding segmentation performance on public dataset, which evidences the practicability of our attention blocks. Ourbaseline models are available11https://github.com/zhiweihu1103/pig-instance-segmentation.
Read full abstract