Enhancement of GUI Display Error Detection Using Improved Faster R-CNN and Multi-Scale Attention Mechanism

Xi Pan,Yimang Li,Yingying Cao,Zhan Huan

doi:10.3390/app14031144

Abstract

Graphical user interfaces (GUIs) hold an irreplaceable position in modern software and applications. Users can interact through them. Due to different terminal devices, there are sometimes display errors, such as component occlusion, image loss, text overlap, and empty values during software rendering. To address the aforementioned common four GUI display errors, a target detection algorithm based on the improved Faster R-CNN is proposed. Specifically, ResNet-50 is used instead of the traditional VGG-16 as the feature extraction network. The feature pyramid network (FPN) and the enhanced multi-scale attention (EMA) algorithm are introduced to improve accuracy. ROI-Align is used instead of ROI-Pooling to enhance the generalization capability of the network. Since training models require a large number of labeled screenshots of errors, there is currently no publicly available dataset with GUI display problems. Therefore, a training data generation algorithm has been developed, which can automatically generate screenshots with GUI display problems based on the Rico dataset. Experimental results show that the improved Faster R-CNN achieves a detection accuracy of 87.3% in the generated GUI problem dataset, which is a 7% improvement compared to the previous version.

Full Text