Automatic Text Classification Research Articles

Topic discovery involves identifying the main ideas within large volumes of textual data. It indicates recurring topics in documents, providing an overview of the text. Current topic discovery models receive the text, with or without pre-processing, including stop word removal, text cleaning, and normalization (lowercase conversion). A topic discovery process that receives general domain text with or without processing generates general topics. General topics do not offer detailed overviews of the input text, and manual text categorization is tedious and time-consuming. Extracting topics from text with an automatic classification task is necessary to generate specific topics enriched with top words that maintain semantic relationships among them. Therefore, this paper presents an approach that integrates text classification for topic discovery from large amounts of English textual data, such as 20-Newsgroups and Reuters Corpora. We rely on integrating automatic text classification before the topic discovery process to obtain specific topics for each class with relevant semantic relationships between top words. Text classification performs a word analysis that makes up a document to decide what class or category to identify; then, the proposed integration provides latent and specific topics depicted by top words with high coherence from each obtained class. Text classification accomplishes this with a convolutional neural network (CNN), incorporating an embedding model based on semantic relationships. Topic discovery over categorized text is realized with latent Dirichlet analysis (LDA), probabilistic latent semantic analysis (PLSA), and latent semantic analysis (LSA) algorithms. An evaluation process for topic discovery over categorized text was performed based on the normalized topic coherence metric. The 20-Newsgroups corpus was classified, and twenty topics with the ten top words were identified for each class. The normalized topic coherence obtained was 0.1723 with LDA, 0.1622 with LSA, and 0.1716 with PLSA. The Reuters Corpus was also classified, and twenty and fifty topics were identified. A normalized topic coherence of 0.1441 was achieved when applying the LDA algorithm, obtaining 20 topics for each class; with LSA, the coherence was 0.1360, and with PLSA, it was 0.1436.

Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machine learning techniques (MLTs) are powerful tools for automatic text classification. The MLT system is useful to improve injury surveillance by speeding up the manual free-text coding tasks of ED diagnoses. This research aims to develop a tool for automatic free-text classification of ED diagnoses to automatically identify injury cases. The automatic classification system also serves for epidemiological purposes to identify the burden of pediatric injuries in Padua, a large province in the Veneto region in the Northeast Italy. The study includes 283,468 pediatric admissions between 2007 and 2018 to the Padova University Hospital ED, a large referral center in Northern Italy. Each record reports a diagnosis by free text. The records are standard tools for reporting patient diagnoses. An expert pediatrician manually classified a randomly extracted sample of approximately 40,000 diagnoses. This study sample served as the gold standard to train an MLT classifier. After preprocessing, a document-term matrix was created. The machine learning classifiers, including decision tree, random forest, gradient boosting method (GBM), and support vector machine (SVM), were tuned by 4-fold cross-validation. The injury diagnoses were classified into 3 hierarchical classification tasks, as follows: injury versus noninjury (task A), intentional versus unintentional injury (task B), and type of unintentional injury (task C), according to the World Health Organization classification of injuries. The SVM classifier achieved the highest performance accuracy (94.14%) in classifying injury versus noninjury cases (task A). The GBM method produced the best results (92% accuracy) for the unintentional and intentional injury classification task (task B). The highest accuracy for the unintentional injury subclassification (task C) was achieved by the SVM classifier. The SVM, random forest, and GBM algorithms performed similarly against the gold standard across different tasks. This study shows that MLTs are promising techniques for improving epidemiological surveillance, allowing for the automatic classification of pediatric ED free-text diagnoses. The MLTs revealed a suitable classification performance, especially for general injuries and intentional injury classification. This automatic classification could facilitate the epidemiological surveillance of pediatric injuries by also reducing the health professionals' efforts in manually classifying diagnoses for research purposes.

Automatic Text Classification Research Articles

Related Topics

Articles published on Automatic Text Classification

Retraction Note: Automatic text classification using machine learning and optimization algorithms

Hazard Analysis for Massive Civil Aviation Safety Oversight Reports Using Text Classification and Topic Modeling

Detecting information from Twitter on landslide hazards in Italy using deep learning models

Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models

Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Daugiaklasių duomenų klasifikavimo metodų tyrimas

Development and application of financial statement filing robot based on RPA technology

Performance, Energy Consumption and Costs: A Comparative Analysis of Automatic Text Classification Approaches in the Legal Domain

Research on fault diagnosis of patient monitor based on text mining

Study of Text Patterns Found on Social Networks of Mental Health Reactions to COVID-19.

AUTOMATIC EVALUATION OF QUALITY OF EXAMS' QUESTIONS WRITTEN IN ARABIC LANGUAGE BASED ON BLOOM’S TAXONOMY: A SURVEY

Convolutional Neural Network Algorithm–Based Novel Automatic Text Classification Framework for Construction Accident Reports

CNN-Bi-LSTM Model for MOOC Forum Post Classification

Optimizing Automatic Text Classification Approach in Adaptive Online Collaborative Discussion–A Perspective of Attention Mechanism-Based Bi-LSTM

Adapting Feature Selection Algorithms for the Classification of Chinese Texts

Text classification by CEFR levels using machine learning methods and BERT language model

Text Study of Reader Magazine in the Context of Big Data

Integrating Text Classification into Topic Discovery Using Semantic Embedding Models

A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification

Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning-Based Text-Mining Approach.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Automatic Text Classification Research Articles

Related Topics

Articles published on Automatic Text Classification

Retraction Note: Automatic text classification using machine learning and optimization algorithms

Hazard Analysis for Massive Civil Aviation Safety Oversight Reports Using Text Classification and Topic Modeling

Detecting information from Twitter on landslide hazards in Italy using deep learning models

Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models

Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.

Daugiaklasių duomenų klasifikavimo metodų tyrimas

Development and application of financial statement filing robot based on RPA technology

Performance, Energy Consumption and Costs: A Comparative Analysis of Automatic Text Classification Approaches in the Legal Domain

Research on fault diagnosis of patient monitor based on text mining

Study of Text Patterns Found on Social Networks of Mental Health Reactions to COVID-19.

AUTOMATIC EVALUATION OF QUALITY OF EXAMS' QUESTIONS WRITTEN IN ARABIC LANGUAGE BASED ON BLOOM’S TAXONOMY: A SURVEY

Convolutional Neural Network Algorithm–Based Novel Automatic Text Classification Framework for Construction Accident Reports

CNN-Bi-LSTM Model for MOOC Forum Post Classification

Optimizing Automatic Text Classification Approach in Adaptive Online Collaborative Discussion–A Perspective of Attention Mechanism-Based Bi-LSTM

Adapting Feature Selection Algorithms for the Classification of Chinese Texts

Text classification by CEFR levels using machine learning methods and BERT language model

Text Study of Reader Magazine in the Context of Big Data

Integrating Text Classification into Topic Discovery Using Semantic Embedding Models

A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification

Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning-Based Text-Mining Approach.