Abstract

Term weighting methods assign appropriate weights to the terms in a document so that more important terms receive higher weights for the text representation. In this study, we consider four term weighting and three feature selection methods and investigate how these term weighting methods respond to the reduced text representation. We conduct experiments on five Turkish review datasets so that we can establish baselines and compare the performance of these term weighting methods. We test these methods on the English reviews so that we can identify their differences with the Turkish reviews. We show that both tf and tp weighting methods are the best for the Turkish, while tp is the best for the English reviews. When feature selection is applied, tf * idf method with DFD and χ2 has the highest accuracies for the Turkish, while tf * idf and tp methods with χ2 have the best performance for the English reviews.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call