Text Classification Algorithms: A Survey

Kamran Kowsari,Mojtaba Heidarysafa,Donald Brown,Laura Barnes,Sanjana Mendu,Kiana Jafari Meimandi

doi:10.3390/info10040150

Kamran Kowsari, Mojtaba Heidarysafa + Show 4 more

Open Access

https://doi.org/10.3390/info10040150

Copy DOI

Abstract

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed.

Highlights

Text classification problems have been widely studied and addressed in many real applications [1,2,3,4,5,6,7,8] over the last few decades
(I) Feature Extraction: In general, texts and documents are unstructured data sets. These unstructured text sequences must be converted into a structured feature space when using mathematical modeling as part of a classifier
1 − accuracy), on the other hand, are not widely used for text classification applications because they are insensitive to variations in the number of correct decisions due to the large value of the denominator (TP + true negatives (TN)) [215]

Summary

Introduction

Text classification problems have been widely studied and addressed in many real applications [1,2,3,4,5,6,7,8] over the last few decades. With recent breakthroughs in Natural Language Processing (NLP) and text mining, many researchers are interested in developing applications that leverage text classification methods. Most text classification and document categorization systems can be deconstructed into the following four phases: Feature extraction, dimension reductions, classifier selection, and evaluations. We discuss the structure and technical implementations of text classification systems in terms of the pipeline illustrated in Figure 1 (The source code and the results are shared as free tools at https://github.com/kk7nc/Text_Classification). The initial pipeline input consists of some raw text data set. Text data sets contain sequences of text in documents as D = {X1, X2, . XN} where Xi refers to a data point (i.e., document, text segment) with s number of sentences such that each sentence includes ws words with lw letters. Each point is labeled with a class value from a set of k different discrete value indices [7]

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Apr 23, 2019
Citations: 945	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Text Classification Algorithms: A Survey

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Classification of Documents Using Machine Learning and Genetic Algorithms
Chaima Ahle Touate ... Hicham Zougagh
-
Chaima Ahle Touate, et. al.Chaima Ahle Touate ... Hicham Zougagh
01 Jan 2020
01 Jan 2020

Identifying Duplicate Questions in Community Question Answering Forums Using Machine Learning Approaches
Divya Vanam ... Venkateswara Rao Pulipati
-
Divya Vanam, et. al.Divya Vanam ... Venkateswara Rao Pulipati
01 Jan 2020
01 Jan 2020

Text Classification Techniques: A Literature Review
M Thangaraj ... M Sivakami
Interdisciplinary Journal of Information, Knowledge, and Management | VOL. 13
M Thangaraj, et. al.M Thangaraj ... M Sivakami
01 Jan 2018
Interdisciplinary Journal of Information, Knowledge, and Management | VOL. 13

Text Classification on Customer Feedback: A Systematic Literatures Review
Zuleaizal Sidek ... Noor Hasimah Ibrahim Teo
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 34
Zuleaizal Sidek, et. al.Zuleaizal Sidek ... Noor Hasimah Ibrahim Teo
01 May 2024
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text Classification Algorithms: A Survey

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information