Abstract

Extracting software features from the public product descriptions in the natural language is beneficial for developing new products. Because software features are often expressed in phrases, many approaches currently propose to define phrase patterns and extract phrases as features from product descriptions accordingly. However, there are often lots of noisy phrases extracted because public product descriptions are described freely by different designers and it is difficult to obtain accurate phrase patterns in practice. It is also not suitable to filter those noisy phrases according to frequencies because some important features may be infrequent. To address such issues, this paper proposes a feature extraction approach by extracting phrases as features from the sentence clusters among product descriptions rather than directly from the product descriptions. Considering that more than one feature can be described in one sentence, a new algorithm is designed to detect the overlapping sentence clusters from public product descriptions. It can detect all potential sentence clusters and reduce the affection of noisy descriptions. By taking bigram collocations as the phrase pattern, the bigram collocations containing cluster keywords are elicited as features from each detected sentence cluster. The evaluations conducted on the public software product descriptions from the application market of Softpedia.com, have shown that the proposed approach has better performance than the competitive approaches in terms of precision and time consumption.

Highlights

  • With the popularity of the application markets such as Google Play, Apple store and Softpedia, lots of natural language data such as the product descriptions and the user comments have been accumulated

  • Study IV: By incremental diffusive clustering (IDC) approach, the overlapping sentence clusters are detected from the product descriptions and the closest sentences to the centroid of each cluster are selected as feature descriptors, while DSE detects the overlapping sentence clusters but selects the frequent bigram collocations containing the cluster keywords as the feature descriptors

  • This approach is the integration of the overlapping sentence clusters detection and the bigram collocation extraction

Read more

Summary

INTRODUCTION

With the popularity of the application markets such as Google Play, Apple store and Softpedia, lots of natural language data such as the product descriptions and the user comments have been accumulated. We propose a new approach named DSE (short for detect, select and extract) for the feature extraction with the aim to improve the accuracy It proposes to extract phrases as features from the sentence clusters of the product descriptions rather than directly from the product descriptions to reduce noisy phrases. It has been validated that the proposed approach has better accuracy than both the competing approach of detecting overlapping sentence clusters and the approach of extracting bigram collocations directly from product descriptions. This indicates that extracting phrases from sentence clusters improves the accuracy of feature extraction.

BACKGROUND
PREPROCESS TEXTUAL DATA
DETECT OVERLAPPING SENTENCE CLUSTERS
Result
SELECT GIVEN NUMBER OF CLUSTERS
EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.