A DistilBERT-based hierarchical text classification for traffic analysis

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

A DistilBERT-based hierarchical text classification for traffic analysis

Similar Papers
  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-74296-6_21
Effective Seed-Guided Topic Labeling for Dataless Hierarchical Short Text Classification
  • Jan 1, 2021
  • Yi Yang + 5 more

Hierarchical text classification has a wide application prospect on the Internet, which aims to classify texts into a given hierarchy. Supervised methods require a large amount of labeled data and are thus costly. For this purpose, the task of dataless hierarchical text classification has attracted more and more attention of researchers in recent years, which only requires a few relevant seed words for given categories. However, existing approaches mainly focus on long texts without considering the characteristics of short texts, so are not suitable in many scenarios. In this paper, we tackle dataless hierarchical short text classification for the first time, and propose an innovative model named Hierarchical Seeded Biterm Topic Model (HierSeedBTM), which effectively leverages seed words in Biterm Topic Model (BTM) to guide the hierarchical topic labeling. Specifically, our model introduces iterative distribution propagation mechanism among topic models in different levels to incorporate the hierarchical structure information. Experiments on two public datasets show that the proposed model is more effective than the state-of-the-art methods of dataless hierarchical text classification designed for long texts.

  • Book Chapter
  • Cite Count Icon 11
  • 10.1007/978-1-4615-0435-1_14
Hierarchical Text Classification Methods and Their Specification
  • Jan 1, 2003
  • Aixin Sun + 2 more

Hierarchical text classification refers to assigning text documents to the categories in a given category tree based on their content. With large number of categories organized as a tree, hierarchical text classification helps users to find information more quickly and accurately. Nevertheless, hierarchical text classification methods in the past have often been constructed in a proprietary manner. The construction steps often involve human efforts and are not completely automated. In this chapter, we therefore propose a specification language known as HCL (Hierarchical Classification Language). HCL is designed to describe a hierarchical classification method including the definition of a category tree and training of classifiers associated with the categories. Using HCL, a hierarchical classification method can be materialized easily with the help of a method generator system.Key wordsHierarchical Text ClassificationSpecification Language

  • Research Article
  • 10.2478/acss-2025-0005
Hierarchical Text Classification: Fine-tuned GPT-2 vs BERT-BiLSTM
  • Jan 1, 2025
  • Applied Computer Systems
  • Djelloul Bouchiha + 4 more

Hierarchical Text Classification (HTC) is a specialised task in natural language processing that involves categorising text into a hierarchical structure of classes. This approach is particularly valuable in several domains, such as document organisation, sentiment analysis, and information retrieval, where classification schemas naturally form hierarchical structures. In this paper, we propose and compare two deep learning-based models for HTC. The first model involves fine-tuning GPT-2, a large language model (LLM), specifically for hierarchical classification tasks. Fine-tuning adapts GPT-2’s extensive pre-trained knowledge to the nuances of hierarchical classification. The second model leverages BERT for text preprocessing and encoding, followed by a BiLSTM layer for the classification process. Experimental results demonstrate that the fine-tuned GPT-2 model significantly outperforms the BERT-BiLSTM model in accuracy and F1 scores, underscoring the advantages of using advanced LLMs for hierarchical text classification.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-030-61616-8_60
A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification
  • Jan 1, 2020
  • Yinglong Ma + 2 more

Text classification has become increasingly challenging due to the continuous refinement of classification label granularity and the expansion of classification label scale. To address that, some research has been applied onto strategies that exploit the hierarchical structure in problems with a large number of categories. At present, hierarchical text classification (HTC) has received extensive attention and has broad application prospects. Making full use of the relationship between parent category and child category in text classification task can greatly improve the performance of classification. In this paper, We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC. Our method makes full use of the connection between the upper-level and lower-level labels. Experiments show that our model outperforms the state-of-the-art hierarchical model at a lower computation cost.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.eij.2020.08.004
HMATC: Hierarchical multi-label Arabic text classification model using machine learning
  • Sep 22, 2020
  • Egyptian Informatics Journal
  • Nawal Aljedani + 2 more

HMATC: Hierarchical multi-label Arabic text classification model using machine learning

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.eswa.2021.115905
Hybrid embedding-based text representation for hierarchical multi-label text classification
  • Sep 20, 2021
  • Expert Systems with Applications
  • Yinglong Ma + 5 more

Hybrid embedding-based text representation for hierarchical multi-label text classification

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/nnice58320.2023.10105736
Hierarchical Multi-label Text Classification Method Based On Multi-level Decoupling
  • Feb 24, 2023
  • Qingwu Fan + 1 more

In order to facilitate subsequent processing, the government hotline assigns hierarchical labels to the collected complaint and report texts. Classification of hierarchical multi-label text is a challenging task. Most previous studies regard hierarchical multi-label text classification as a flat multi-label classification problem, ignoring the constraints and connections between hierarchical labels. In this paper, we set hierarchical multi-label text classification as a sequence generation task, and propose a sequence-to-sequence-based hierarchical multi-label text classification model. Mainly use the method of multi-level decoupling to make better use of the connection between hierarchical tags, transform the constraints between tags into usable information, and help better classification. Compared with other models, the model proposed in this paper has a significant effect in the text classification of 12345 wading-related complaints and reports in Beijing.

  • Research Article
  • Cite Count Icon 33
  • 10.1016/j.ipm.2017.10.003
Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach
  • Oct 23, 2017
  • Information Processing & Management
  • Fawaz S Al-Anzi + 1 more

Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach

  • Research Article
  • 10.2478/awutm-2026-0001
Hierarchical Arabic text classification: deep learning-based approach
  • Jan 1, 2025
  • Annals of West University of Timisoara - Mathematics and Computer Science
  • Djelloul Bouchiha + 7 more

Text classification is the task of assigning textual data to predefined categories, playing a crucial role in natural language processing. In recent years, deep learning models have demonstrated superior performance over traditional machine learning approaches in text classification tasks. This paper presents a supervised deep learning approach for hierarchical Arabic text classification. To facilitate this study, we developed WiHArD, a novel hierarchical Arabic text dataset, where each text is systematically labeled according to a structured category hierarchy. We then propose a deep learning model that integrates BERT-based feature extraction with a neural network classifier. BERT encodes textual inputs into dense vector representations, while the neural network learns to accurately classify texts within the hierarchical structure. Our comparative study demonstrates that the proposed BERT-ANN model achieves significant improvements in hierarchical classification performance, outperforming the existing HMATC model. These findings highlight the e ectiveness of deep learning-based approaches in advancing Arabic text classification.

  • Conference Article
  • Cite Count Icon 30
  • 10.24963/ijcai.2020/619
F-HMTC: Detecting Financial Events for Investment Decisions Based on Neural Hierarchical Multi-Label Text Classification
  • Jul 1, 2020
  • Xin Liang + 5 more

The share prices of listed companies in the stock trading market are prone to be influenced by various events. Performing event detection could help people to timely identify investment risks and opportunities accompanying these events. The financial events inherently present hierarchical structures, which could be represented as tree-structured schemes in real-life applications, and detecting events could be modeled as a hierarchical multi-label text classification problem, where an event is designated to a tree node with a sequence of hierarchical event category labels. Conventional hierarchical multi-label text classification methods usually ignore the hierarchical relationships existing in the event classification scheme, and treat the hierarchical labels associated with an event as uniform labels, where correct or wrong label predictions are assigned with equal rewards or penalties. In this paper, we propose a neural hierarchical multi-label text classification method, namely F-HMTC, for a financial application scenario with massive event category labels. F-HMTC learns the latent features based on bidirectional encoder representations from transformers, and directly maps them to hierarchical labels with a delicate hierarchy-based loss layer. We conduct extensive experiments on a private financial dataset with elaborately-annotated labels, and F-HMTC consistently outperforms state-of-art baselines by substantial margins. We will release both the source codes and dataset on the first author's repository.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 51
  • 10.18653/v1/w17-2339
Initializing neural networks for hierarchical multi-label text classification
  • Jan 1, 2017
  • Simon Baker + 1 more

Many tasks in the biomedical domain require the assignment of one or more predefined labels to input text, where the labels are a part of a hierarchical structure (such as a taxonomy). The conventional approach is to use a one-vs.-rest (OVR) classification setup, where a binary classifier is trained for each label in the taxonomy or ontology where all instances not belonging to the class are considered negative examples. The main drawbacks to this approach are that dependencies between classes are not leveraged in the training and classification process, and the additional computational cost of training parallel classifiers. In this paper, we apply a new method for hierarchical multi-label text classification that initializes a neural network model final hidden layer such that it leverages label co-occurrence relations such as hypernymy. This approach elegantly lends itself to hierarchical classification. We evaluated this approach using two hierarchical multi-label text classification tasks in the biomedical domain using both sentence- and document-level classification. Our evaluation shows promising results for this approach.

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/2695664.2695755
Approximate block coordinate descent for large scale hierarchical classification
  • Apr 13, 2015
  • Anveshi Charuvaka + 1 more

In real world, we often encounter hierarchical classification problems with large number of categories and deep hierarchies. In addition, majority of the categories do not have sufficient examples for training classifiers with good generalization performance. Usually, the feature space is also large, and especially so for text classification problems. Binary, multi-class, or multi-label classification approaches that treat the hierarchical classification as a flat classification problem, disregarding the hierarchical relationships, fail to leverage the relatedness of the categories in the learning process and, consequently, perform poorly. Several approaches for hierarchical classification have been proposed in literature, but a majority of them are not sufficiently scalable to address large scale classification problems. In this paper, we study a hierarchical classification method that addresses large scale classification problem within regularized risk minimization framework. Specifically, the method studied here exploits hierarchical relationships between categories by imposing the constraint that the learned model vectors for a category should be similar to its parent category. We study and analyze an approximate block coordinate descent procedure and compare its performance to a previously proposed exact coordinate descent method for this problem. We further examine the performance of this method on various aspects of the hierarchical classification problem on large hierarchical text classification datasets.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.datak.2024.102281
A new sentence embedding framework for the education and professional training domain with application to hierarchical multi-label text classification
  • Jan 19, 2024
  • Data & Knowledge Engineering
  • Guillaume Lefebvre + 4 more

A new sentence embedding framework for the education and professional training domain with application to hierarchical multi-label text classification

  • Book Chapter
  • Cite Count Icon 20
  • 10.1007/978-3-642-30217-6_2
Active Learning for Hierarchical Text Classification
  • Jan 1, 2012
  • Xiao Li + 2 more

Hierarchical text classification plays an important role in many real-world applications, such as webpage topic classification, product categorization and user feedback classification. Usually a large number of training examples are needed to build an accurate hierarchical classification system. Active learning has been shown to reduce the training examples significantly, but it has not been applied to hierarchical text classification due to several technical challenges. In this paper, we study active learning for hierarchical text classification. We propose a realistic multi-oracle setting as well as a novel active learning framework, and devise several novel leveraging strategies under this new framework. Hierarchical relation between different categories has been explored and leveraged to improve active learning further. Experiments show that our methods are quite effective in reducing the number of oracle queries (by 74% to 90%) in building accurate hierarchical classification systems. As far as we know, this is the first work that studies active learning in hierarchical text classification with promising results.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icccbda.2017.7951950
Feature selection algorithm for hierarchical text classification using Kullback-Leibler divergence
  • Apr 1, 2017
  • Yao Lifang + 2 more

Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology of hierarchical text automatic classification, and the general method mainly aims at the individual feature selection of each class in the class hierarchy, and ignores the correlation between the parent and child class. This paper proposes a feature selection method based on KL divergence, measure the correlation between the class and subclasses by the KL divergence, calculate the correlation between each feature and sub class by Mutual Information method, measure the importance of subclasses characteristics using Term Frequency probability, to select the better discrimination set of features for parent class node. In this paper, we used hierarchical feature selection method and SVM classifiers for the hierarchical text categorization task on two corpora. Experiments showed the algorithm we proposed was effective, compared with the χ2 statistic (CHI), information gain (IG), and mutual information (MI) that were used directly to select hierarchical feature.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon