Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews

Hamza Aldabbas,Muhammad Farhan,Rana M Amir Latif,Abdullah Bajahzar,Ali Adil Qureshi,Meshrif Alruily

doi:10.1515/jisys-2019-0197

Abstract

Abstract To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.

Highlights

In natural language processing, classifying documents and strings into different categories is considered a vital task in the process
The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes
This section addresses the evaluation of the scraped dataset by using different machine learning algorithm like Logistics Regression Algorithm, Naïve Bayes Multinomial, and Random Forest Algorithm

Summary

Introduction

In natural language processing, classifying documents and strings into different categories is considered a vital task in the process. The online information text classification gained an important role nowadays. The authors have used text classification of an email as spam for detecting user’s sentiments of comments or tweets [1]. It is difficult to conduct automatic tagging of customer queries, classification of blogs in different categories, and dealing with the small training dataset. The learners find that text classification is extremely challenging for generalizing.

Methods

Results

Conclusion