Abstract

Big multimedia data is heterogeneous in essence, that is, the data may be a mixture of video, audio, text, and images. This is due to the prevalence of novel applications in recent years, such as social media, video sharing, and location based services (LBS), etc. In many multimedia applications, for example, video/image tagging and multimedia recommendation, text classification techniques have been used extensively to facilitate multimedia data processing. In this paper, we give a comprehensive review on feature selection techniques for text classification. We begin by introducing some popular representation schemes for documents, and similarity measures used in text classification. Then, we review the most popular text classifiers, including Nearest Neighbor (NN) method, Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks. Next, we survey four feature selection models, namely the filter, wrapper, embedded and hybrid, discussing pros and cons of the state-of-the-art feature selection approaches. Finally, we conclude the paper and give a brief introduction to some interesting feature selection work that does not belong to the four models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.