Abstract

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.

Highlights

  • In an information era where a large amount of data needs to be processed every day, minute and second—and the huge demand on computers with high processing speeds to outcome accurate results within nanoseconds, it is said that all approximately 2.5 quintillion bytes of data can be manually or automatically generated on a daily basis using different tools and application

  • In term frequency (TF)/inverse document frequency (IDF) base we showed that the logistic regression algorithm had a 0.621%

  • We evaluated the results by using different machine-learning algorithms like naïve Bayes, random forest and logistic regression algorithm that can check the semantics of reviews of some random forest and logistic regression algorithm that can check the semantics of reviews of some applications from users that their reviews were good, bad, average, etc

Read more

Summary

Introduction

In an information era where a large amount of data needs to be processed every day, minute and second—and the huge demand on computers with high processing speeds to outcome accurate results within nanoseconds, it is said that all approximately 2.5 quintillion bytes of data can be manually or automatically generated on a daily basis using different tools and application. This illustrates the importance of text-mining techniques in handling and classifying data in a meaningful way. We used various algorithms and text classification techniques using Android application reviews [2]

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.