Hierarchical graph-based text classification framework with contextual node embedding and BERT-based dynamic fusion

Aytuğ Onan

doi:10.1016/j.jksuci.2023.101610

Abstract

We propose a novel hierarchical graph-based text classification framework that leverages the power of contextual node embedding and BERT-based dynamic fusion to capture the complex relationships between the nodes in the hierarchical graph and generate a more accurate classification of text. The framework consists of seven stages: Linguistic Feature Extraction, Hierarchical Node Construction with Domain-Specific Knowledge, Contextual Node Embedding, Multi-Level Graph Learning, Dynamic Text Sequential Feature Interaction, Attention-Based Graph Learning, and Dynamic Fusion with BERT. The first stage, Linguistic Feature Extraction, extracts the linguistic features of the text, including part-of-speech tags, dependency parsing, and named entities. The second stage constructs a hierarchical graph based on the domain-specific knowledge, which is used to capture the relationships between nodes in the graph. The third stage, Contextual Node Embedding, generates a vector representation for each node in the hierarchical graph, which captures its local context information, linguistic features, and domain-specific knowledge. The fourth stage, Multi-Level Graph Learning, uses a graph convolutional neural network to learn the hierarchical structure of the graph and extract the features of the nodes in the graph. The fifth stage, Dynamic Text Sequential Feature Interaction, captures the sequential information of the text and generates dynamic features for each node. The sixth stage, Attention-Based Graph earning, uses an attention mechanism to capture the important features of the nodes in the graph. Finally, the seventh stage, Dynamic Fusion with BERT, combines the output from the previous stages with the output from a pre-trained BERT model to obtain the final integrated vector representation of the text. This approach leverages the strengths of both the proposed framework and BERT, allowing for better performance on the classification task. The proposed framework was evaluated on several benchmark datasets and compared to state-of-the-art methods, demonstrating significant improvements in classification accuracy.

Full Text