Abstract

Term weighting is one of the most commonly used approaches, which works by assigning weights to terms, that aims to improve the performance of information retrieval or text categorization tasks. In this paper, we present a novel term weighting technique, called discriminative feature spamming technique (DFST), which identifies distinctive terms, based on a term utility criteria (TUC), and then spams them to increase their discriminative power. The experimental results show that the DFST outperformed a set of time-tested term weighting schemes, from the information retrieval field. All the experiments were performed on the largest ever Roman Urdu (RU) dataset of 11000 reviews, which was collected and annotated for this work. In addition, a custom tokenizer was built, which further improved classification accuracy. A cross-scheme comparison was performed, which showed that the results obtained by using the newly proposed DFST, were statistically significant and better than previous approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call