Deep learning-based semantic classification of EMF-related scientific literature

Kwanghee Won,Hyung-Do Choi,Sung Shin

doi:10.1145/3477127.3477131

Abstract

Semantic classification of scientific literature using machine learning approaches is challenging due to the difficulties in labeling data and the length of the texts [2, 7]. Most of the work has been done for keyword-based categorization tasks, which take care of occurrence of important terms, whereas semantic classification requires understanding of terms and the meaning of sentences in a context. In this study, we have evaluated neural network models on a semantic classification task using 1091 labeled EMF-related scientific papers listed in the Powerwatch study. The EMF-related papers are labeled into three categories: positive, null finding, and neither. We have conducted neural architecture and hyperparameter search to find the most suitable model for the task. In experiments, we compared the performance of several neural network models in terms of classification accuracy. In addition, we have tested two different types of attention mechanisms. First, a Fully Convolutional Neural Network (FCN) has been used to identify important sentences in the text for the semantic classification. Second, the Transformer, a self-attention-based model, has been tested on the dataset. The experimental result showed that the BiLSTM performed best on both unbalanced and balanced data and the FCN was able to identify important parts in input texts.

Full Text