Abstract

Today, computing environment provides the possibility of carrying out various data-intensive natural language processing tasks. Language tokenization methods applied for multi-class text classification are recently investigated by many data scientists. The authors of this paper investigate Logistic Regression method by evaluating classification accuracy which correlates on the size of the training data, POS and number of n-grams. Logistic Regression method is implemented in Apache Spark, the in-memory intensive computing platform. Experimental results have shown that applied multi-class classification method for Amazon product-review data using POS features has higher classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call