Hybrid Feature Fusion Learning Towards Chinese Chemical Literature Word Segmentation

Xiang Li,Quanyin Zhu,Kewen Zhang,Yuanyuan Wang,Jialin Ma

doi:10.1109/access.2020.3049136

Abstract

The rapid increase in the number of chemical science literature has brought challenges to researchers in search and data analysis. For many chemical scientific literature, extracting information from text and using knowledge is the focus of research. However, the existing Chinese text word segmentation methods have low recognition rate for chemical terms. The reason is that the addition of many new vocabulary and mixed professional vocabulary of Chinese and English brings challenges to word segmentation. In this paper, we propose a word segmentation method of Chinese chemical literature based on hybrid feature fusion learning Model(HFFLM). HFFLM first establishes chemical science corpus (Chem-pku) to train Chinese word segmentation (CWS) tasks. In addition, HFFLM uses BiLSTM and CNN to extract document features and fuse them. Then, HFFLM combines boundary features to construct conditional random field to train the end-to-end CWS model. In the end, HFFLM makes visual analysis of the word segmentation results. The experimental results indicate that HFFLM has high accuracy and recall rate, and is suitable for chemical industry vocabulary extraction with mixed Chinese and English.

Highlights

In recent years, scientific research in the chemical industry has focused on monitoring data generated in the chemical production process, such as raw material data parameters, manufacturing process parameters, equipment electromechanical parameters, and abnormal diagnostic parameters [1]
This article first introduces the research significance and related work of chemical science literature word segmentation; describes the construction process of the hybrid feature fusion model and the Chinese chemical industry literature word segmentation process; the Microsoft Asia Research Institute’s MSR corpus and the customized chemical science literature corpus Chempku are used as Experimental data, use Hidden Markov Model (HMM), Conditional Random Fields (CRF), IDCNN_CRF, BiLSTM_CRF, BiLSTM-BiLSTM, BiLSTM-CNN and HFFLM for chemical industry literature segmentation, and analyze the advantages of the proposed model and future work based on the experimental results
In view of the drawbacks of the above researches, we propose a method of chemical literature knowledge extraction based on hybrid feature fusion (HFFLM) to extract information from chemical science literature

Summary

INTRODUCTION

Scientific research in the chemical industry has focused on monitoring data generated in the chemical production process, such as raw material data parameters, manufacturing process parameters, equipment electromechanical parameters, and abnormal diagnostic parameters [1]. Facing the different needs of chemical experts, we can effectively extract relevant information from chemical literature to obtain more meaningful data and build a professional search engine for the chemical industry This is of great help to the academic research of experts. This article first introduces the research significance and related work of chemical science literature word segmentation; describes the construction process of the hybrid feature fusion model and the Chinese chemical industry literature word segmentation process; the Microsoft Asia Research Institute’s MSR corpus and the customized chemical science literature corpus Chempku are used as Experimental data, use HMM, CRF, IDCNN_CRF, BiLSTM_CRF, BiLSTM-BiLSTM, BiLSTM-CNN and HFFLM for chemical industry literature segmentation, and analyze the advantages of the proposed model and future work based on the experimental results

RELATED WORK

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hybrid Feature Fusion Learning Towards Chinese Chemical Literature Word Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

GeoBERTSegmenter: Word Segmentation of Chinese Texts in the Geoscience Domain Using the Improved BERT Model
Dongqi Wei ... Kai Ma
Earth and Space Science | VOL. 9
Dongqi Wei, et. al.Dongqi Wei ... Kai Ma
01 Oct 2022
Earth and Space Science | VOL. 9

Chinese Word Segmentation and Recognition Based on Separable Convolution Bidirectional Long Short-Term Memory and Feature Point
...
-
, et. al. ...
18 Dec 2020
18 Dec 2020

A Word Segmentation Method of Ancient Chinese Based on Word Alignment
Chao Che ... Xiaoting Wu
-
Chao Che, et. al.Chao Che ... Xiaoting Wu
01 Jan 2019
01 Jan 2019

How to Enhance Chinese Word Segmentation Using Knowledge Graphs
Kunhui Lin ... Zixiang Yang
-
Kunhui Lin, et. al.Kunhui Lin ... Zixiang Yang
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Feature Fusion Learning Towards Chinese Chemical Literature Word Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access