Detecting relevant design patterns from system design or source code helps software developers and maintainers understand the ideas behind the design of large-scale, highly complicated software systems, thereby improving the quality of software systems. Currently, design pattern detection based on machine learning has become a hot research direction. Scholars have proposed many design pattern detection methods based on machine learning. However, most of the existing literature only reports the utilization of traditional machine learning algorithms such as KNN, decision trees, ANN, SVM, etc., which require manual feature extraction and feature selection. It is very difficult to find suitable and effective features for the detection of design patterns. In the previous research, we have initially explored a design pattern detection method based on graph theory and ANN. Based on the research work done, we speculate that if we can realize the end-to-end design pattern detection from system design or source code to design pattern with the help of the powerful automatic feature extraction and other advantages of deep learning, the detection effect can be further improved. This paper intends to first explore a UML model that extends image information, called colored UML, so as to transform the design pattern detection problem into an image classification problem; on this basis, the positive and negative sample sets and the system to be recognized are all expressed in the form of colored UML models, the convolutional neural network VGGNet is used to train the data set to extract features, and the extracted features are trained by the SVM for binary classification to judge the pattern instances. Experiments were carried out on three open-source projects. We used three non-machine learning design pattern detection methods and five design pattern detection methods based on traditional machine learning algorithms, as well as the method in this paper. In general, the method proposed in this paper achieved higher precision and recall, and for different programs and their patterns, the precision and recall were stable at more than 85% in most cases. The experimental results demonstrate that this paper can achieve a better effect in recognizing design patterns. The research is, therefore, of both theoretical significance and application value.
Read full abstract