Extracting Phrases as Software Features From Overlapping Sentence Clusters in Product Descriptions

Yong Cai,Chun Liu

doi:10.1109/access.2019.2962810

Abstract

Extracting software features from the public product descriptions in the natural language is beneficial for developing new products. Because software features are often expressed in phrases, many approaches currently propose to define phrase patterns and extract phrases as features from product descriptions accordingly. However, there are often lots of noisy phrases extracted because public product descriptions are described freely by different designers and it is difficult to obtain accurate phrase patterns in practice. It is also not suitable to filter those noisy phrases according to frequencies because some important features may be infrequent. To address such issues, this paper proposes a feature extraction approach by extracting phrases as features from the sentence clusters among product descriptions rather than directly from the product descriptions. Considering that more than one feature can be described in one sentence, a new algorithm is designed to detect the overlapping sentence clusters from public product descriptions. It can detect all potential sentence clusters and reduce the affection of noisy descriptions. By taking bigram collocations as the phrase pattern, the bigram collocations containing cluster keywords are elicited as features from each detected sentence cluster. The evaluations conducted on the public software product descriptions from the application market of Softpedia.com, have shown that the proposed approach has better performance than the competitive approaches in terms of precision and time consumption.

Highlights

With the popularity of the application markets such as Google Play, Apple store and Softpedia, lots of natural language data such as the product descriptions and the user comments have been accumulated
Study IV: By incremental diffusive clustering (IDC) approach, the overlapping sentence clusters are detected from the product descriptions and the closest sentences to the centroid of each cluster are selected as feature descriptors, while DSE detects the overlapping sentence clusters but selects the frequent bigram collocations containing the cluster keywords as the feature descriptors
This approach is the integration of the overlapping sentence clusters detection and the bigram collocation extraction

Summary

INTRODUCTION

With the popularity of the application markets such as Google Play, Apple store and Softpedia, lots of natural language data such as the product descriptions and the user comments have been accumulated. We propose a new approach named DSE (short for detect, select and extract) for the feature extraction with the aim to improve the accuracy It proposes to extract phrases as features from the sentence clusters of the product descriptions rather than directly from the product descriptions to reduce noisy phrases. It has been validated that the proposed approach has better accuracy than both the competing approach of detecting overlapping sentence clusters and the approach of extracting bigram collocations directly from product descriptions. This indicates that extracting phrases from sentence clusters improves the accuracy of feature extraction.

BACKGROUND

PREPROCESS TEXTUAL DATA

DETECT OVERLAPPING SENTENCE CLUSTERS

Result

SELECT GIVEN NUMBER OF CLUSTERS

EVALUATION

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extracting Phrases as Software Features From Overlapping Sentence Clusters in Product Descriptions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Understanding Language Understanding: Computational Models of Reading edited by Ashwin Ram and Kenneth Moorman
Michael G Dyer
Trends in Cognitive Sciences | VOL. 4
Michael G DyerMichael G Dyer
01 Jan 1999
Trends in Cognitive Sciences | VOL. 4

DeepDepict
Shaoyang Hao ... Hao Wang
ACM Transactions on Knowledge Discovery from Data | VOL. 15
Shaoyang Hao, et. al.Shaoyang Hao ... Hao Wang
28 Jun 2021
ACM Transactions on Knowledge Discovery from Data | VOL. 15

Identifying important package features of milk desserts using free listing and word association
Gastón Ares ... Rosires Deliza
Food Quality and Preference | VOL. 21
Gastón Ares, et. al.Gastón Ares ... Rosires Deliza
27 Mar 2010
Food Quality and Preference | VOL. 21

MatrixMiner: a red pill to architect informal product descriptions in the matrix
Sana Ben Nasr ... Nicolas Sannier
-
Sana Ben Nasr, et. al.Sana Ben Nasr ... Nicolas Sannier
30 Aug 2015
30 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting Phrases as Software Features From Overlapping Sentence Clusters in Product Descriptions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access